dgraph-io / dgraph

The high-performance database for modern applications
https://dgraph.io
Other
20.28k stars 1.48k forks source link

Product Roadmap 2020 #4724

Closed manishrjain closed 4 years ago

manishrjain commented 4 years ago

Here's the product roadmap for 2020.

We have mentioned the features we are planning to focus on in Q1 and Q2 (first half of 2020). For the rest, we'll assess them for their ETA as reach mid-year. Tell us what more you'd like to see happen in 2020!

manishrjain commented 4 years ago

I know a bunch of folks have been looking for Gremlin support. We're currently focused on GraphQL, but if you need Gremlin to work with Dgraph, show your support by upvoting this comment and we'll consider prioritizing it in a few months.

smkhalsa commented 4 years ago

@manishrjain Thanks for the update and for the wonderful work you and the dgraph team are doing.

I'm glad to see that a fully managed SaaS option is on the roadmap.

One question: I don't see any specific mention of exposing full GraphQL+- functionality in spec-compliant GraphQL queries (something like what neo4j-graphql-js offers). There was an indication from the dgraph team shortly after the graphql api was launched that this was coming soon. If that's the case, can you give a preview of how that might work?

manishrjain commented 4 years ago

exposing full GraphQL+- functionality in spec-compliant GraphQL queries

We're considering multiple ways of doing this:

  1. Figuring out ways in which we can port over +- functionality in GraphQL spec compliant way, so a GraphQL user can just construct these queries directly (fuzzy matching, full-text search, has function).

  2. Adding automatically generated GraphQL functions for the complex +- functionality (like, say email: string @index(exact) @upsert could have upsertEmail func automatically generated).

  3. If a feature can't be translated into GraphQL, then having a way for a user to specify a function name along with the +- query it maps to. This is the closest to the resolver pattern exposed via Apollo. This would be done at schema level, instead of involving any particular programming language (Apollo's resolvers are typically javascript based) for maximizing compatibility among various languages.

For now, we're going with easy wins, that is 1 and 2. Once we have covered and exhausted ways to port +- into 1 or 2, we can look into 3.

vikram-ma commented 4 years ago

@manishrjain Where can I find more details to understand what these goals really mean?

Relaxe111 commented 4 years ago

Hello, it looks like multi-tenancy support is added to Enterprise feature. I would kindly ask to reconsider this feature to make it open source or to make a more flexible pricing plan. Locking this feature in Enterprise version could be a big barrier in widely adoptoption of dgraph, at least in the EU it will be for sure.

manishrjain commented 4 years ago

All the enterprise features (including multi-tenancy) would be automatically included in the SaaS offering. That should allow for flexible, pay as you go sort of pricing.

Ludicrous mode -- Idea is to allow a mode of Dgraph, which gives up on some "correctness" things to achieve maximum performance. For a lot of people, if Dgraph doesn't give them the needed speed, they revert to a NoSQL database, which provide you very little in terms of consistency / transactional guarantees. This mode would allow Dgraph to run with lower guarantees, but at a faster speed.

Query Planner: Dgraph doesn't do much query planning right now. It executes the queries in the same order they're given. Of course, we could do a better job by having an internal query planner which can alter the ordering of tasks to achieve better performance.

Do you guys have any checks in place to control GraphQL (and graphql+-) query complexity

Not sure what you mean.

vikram-ma commented 4 years ago

Clients can do arbitrarily complex GraphQL queries.i.e clients control it the query complexity. What it means is that clients can issue very very complex queries i.e queries could take minutes/hours to complete, and impact dgraph's ability to serve other requests.

A valid client is making very complex query, that significantly slows down or impacts dgraph's ability to serve other client requests.

Are there checks in place to detect/prevent/handle this?

willem520 commented 4 years ago

Hello, it looks like multi-tenancy support is added to Enterprise feature. I would kindly ask to reconsider this feature to make it open source or to make a more flexible pricing plan. Locking this feature in Enterprise version could be a big barrier in widely adoptoption of dgraph, at least in the EU it will be for sure.

hi, multi-tenancy is a popular feature, many developer like us need this feature to improve our project. Personally, I hope it could be free. thx

manishrjain commented 4 years ago

Are there checks in place to detect/prevent/handle this?

A client can specify a context with timeout, which can shut the query down once if it runs too long. But, apart from that, nothing avoids that right now. Once we have a query planner and can calculate the cost of running a query, we can do a better job of rejecting expensive queries.

nmabhinandan commented 4 years ago
manishrjain commented 4 years ago

A startup program offering evaluation license of enterprise edition. (or better yet, SaaS credits).

Already there. Every Dgraph instance comes with a month of free enterprise trial.

GeoJson support

Already there. Dgraph has been supporting geo queries since the early days. In fact, some users have said that Dgraph's geo support is better than PostGis (we haven't verified).

Relaxe111 commented 4 years ago

Saas including multi-tenancy I think is not what majority of developers will really searching for. I mean by more flexible pricing plan to make gradual licencing. For example a company could choose and pay only for Enterprise feature in which is interested, if I need only one feature I would not be happy to pay full license but using only one feature.

ghost commented 4 years ago

My company is in the position of exploring different graph DBs to modernize our stack, and Dgraph is the best fit in every category (especially looking at this 2020 roadmap), except for lack of Gremlin support (we would like to port our existing queries, and eventually switch). Excited to see how this roadmap progresses!

manishrjain commented 4 years ago

Generally asking the community here, for Gremlin support, I'm curious if that's really a deal breaker. The port of queries from Gremlin to GraphQL is probably a one-time effort -- and you get the benefit of new, easy to use tech, JSON support, with a growing ecosystem of tools and editors to support creating queries, exploring data, etc. (GraphQL has so many editors).

ghost commented 4 years ago

Oh porting is definitely something we would be opening to do. Its that we are moving under time pressures and we know that what we have works, and that a few of our members are experienced with gremlin. Not to mention we can switch backends as needed with the support. So that is very specific to us, but it would help teams who are exploring dgraph decide early on if it is a good fit for them by allowing them to use what they have.

brianbroderick commented 4 years ago

I mentioned this in #2693 but wanted to make sure people see this.

First, I want to say that I love Dgraph and am an evangelist for your product. I've given several talks and am constantly trying to get people interested in Dgraph.

There's one last hurtle to start getting actual adoption, and that's the ability to have multiple schemas in a dev & test environment.

Most companies in my area use MySql, Maria, or Postgres. Therefore having the ability to have many schemas is something people take for granted and is something people are not willing to pay for.

It's a challenge to get people to switch from something they are comfortable with. Making this as painless as possible is the only way to get widespread adoption.

There are many reasons to have multiple schemas; for example, it's typical to have a dev, test, and prod environment with their respective schemas. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It's also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn't seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Therefore, I want to second everyone's comments about supporting multiple schemas in the free version. The Saas offering isn't going to help my dev and test environments on my laptop.

Relaxe111 commented 4 years ago

This is absolutely true! That's why I kindly asked to reconsider multi-tenancy to be open-source! I try for last year to introduce dgraph in my company but the biggest barrier to convincing my boss is support for multi-tenancy. Last week we had a new discussion and again everyone in my company is skeptical against dgraph because of lack of multi-tenancy in the community version. I can confirm that for multi-tenancy my company isn't willing to pay. Having multiple instances of DB for different Environments dev/ test is an issue in dgraph. I just hope that owners of such amazing product will understand that multi-tenancy will not be an argument for companies to buy a license to use dgraph enterprise. But having multi-tenancy open-source will be an argument to adopt it. It is more likely to switch to Enterprise for a company that uses already dgraph than to switch from well known traditional DBS to dgraph enterprise or dgraph open-source without multi-tenancy. I truly believe that bigger adoption of dgraph will be, bigger Enterprise mass dgraph will have. But without open-source multi-tenancy, the majority of potential future Enterprise users will ignore dgraph to adopt it now as open source.

marvin-hansen commented 4 years ago

My company isn't going to pay for multi-schema / multi-tenancy because any other OSS DB brings it already to the table. Charging for something you get in most OSS DB's for free and that is legally mandated in certain industries or countries is just completely ridiculous. Please start listening to your customers!

Please add GPU acceleration or support for in-DB machine learning to bring at least some tangible value to the enterprise version that would justify a purchase.

https://github.com/dgraph-io/dgraph/issues/4678

https://github.com/dgraph-io/dgraph/issues/4608

bronzels commented 4 years ago

Hope Gremlin support in Q1 pls.

jorroll commented 4 years ago

Generally asking the community here, for Gremlin support, I'm curious if that's really a deal breaker. The port of queries from Gremlin to GraphQL is probably a one-time effort -- and you get the benefit of new, easy to use tech, JSON support, with a growing ecosystem of tools and editors to support creating queries, exploring data, etc. (GraphQL has so many editors).

@manishrjain I imagine most people currently investigating dgraph are people with existing graph db needs and, historically, many existing graph dbs use gremlin. Personally, having used Cypher, Gremlin, and SQL (GraphQL too, though I've never seen it used for directly querying a db), as well as some proprietary APIs like Firestore, I'd say that Gremlin is by far the worst (and one of the reasons why graph dbs are a nich product). I can appreciate someone asking for support because porting an app over to a new language can be a huge undertaking, but, long term, I really hope Gremlin dies in favor of other languages (e.g. upcoming GQL standard). Providing tooling to help port existing Gremlin apps to a newer query language might be a compromise.

I'm speculating, but I think one challenge for dgraph could be that, historically, graph database usage is mostly confined to backend engineers. It seems likely that most of the current dgraph users are backend folks as well. This would contrast with GraphQL which is mostly a frontend query language (tho obviously it can be used server side as well). From my perspective, one of the most exciting aspects of dgraph is the idea that maybe in the future, I can use Apollo Client to query the backend directly from the frontend, eliminating a huge chunk of work in building out an API server (similar to what Firestore or Hasura can accomplish). This is probably not something that has any appeal to backend folks though.

shekarm commented 4 years ago

Hi, I investigated different databases and it appears that multi-tenancy is not something you get free with other databases either. Any implementation of multi-tenancy will require access control lists and other security-related features and most databases require an enterprise license for the same. In someways, Dgraph is following those models.

Relaxe111 commented 4 years ago

Hello @shekarm could you please give concrete examples of such databases? Thnks.

Relaxe111 commented 4 years ago

I respectfully disagree with you. Multi-tenancy and ACL are different features which can't be put together. In my experience, most (if not all) dbs multi-tenancy is open source. But ACL not so many dbs offers that feature either free or Enterprise.

seanlaff commented 4 years ago

Data isolation (which is falling under the multi-tenancy bullet) is critical for dgraph to see success in our company. We have many interested parties, but lacking that feature makes it a non-starter.

If dgraph found a place in our stack, I could see us growing into the enterprise tier (e.x needing granular ACL, fancier snapshot/restore, etc), however lacking rudimentary data isolation in the free-tier hampers our ability to start the journey/build PoCs.

Specifically- risk of schema collision is the real blocker

shekarm commented 4 years ago

Currently, Dgraph implements multi-tenancy and user authentication as part of our ACL implementation, to validate users and their access credentials. We will look at this implementation and see if it makes sense to isolate the credential authorization required for multi-tenancy.

shekarm commented 4 years ago

On the issue of GPU acceleration, there is a separate issue opened by @marvin-hansen and it is being tracked separately here.

seanlaff commented 4 years ago

@shekarm Thanks for your consideration- I think it maps to the elasticsearch pattern of supporting multiple indices in the free tier, and then supporting document (and field) level security in the enterprise tier.

marvin-hansen commented 4 years ago

@manishrjain @shekarm

Please consider an in-memory mode to boost performance, as reported in issue https://github.com/dgraph-io/dgraph/issues/4813

GPU acceleration is hard and complex to implement, but an in-memory mode gives about the same performance, requires less complexity, and scales much cheaper because adding a few more 100GM memory cost way less than adding a few more high-end GPU's.

Ludicrous mode -- Idea is to allow a mode of Dgraph, which gives up on some "correctness" things > to achieve maximum performance.

Do not sacrifice "correctness" for performance, otherwise, Dgraph ends-up being no different. Use a proper in-memory mode like Redis-graph, but actually useful.

larvinloy commented 4 years ago

@thefliik Could you give some examples as to why

Gremlin is by far the worst

jorroll commented 4 years ago

@larvinloy are you asking as a dgraph employee? Or are you just curious? The difference being that the first is "on topic" for this thread, the second is probably "off topic."

larvinloy commented 4 years ago

@thefliik

@larvinloy are you asking as a dgraph employee? Or are you just curious? The difference being that the first is "on topic" for this thread, the second is probably "off topic."

Just curious. Feel free to not respond if it's off topic.

marvin-hansen commented 4 years ago

@larvinloy @thefliik

Gremlin can be used to perform any arbitrary graph query, but it lacks much of the intuitive and clean syntax made available by SPARQL.

As the DGraph engineers aleady have figured out, GrapQL alone doesn't do the trick of querying an RDF graph effectively that's why they came up with the +/- extension.

GraphQL and it's extension still have some way to go but it's certainly a very welcome addition to have a native GraphQL endpoint in DGraph.

I wouldn't call Gremlin the worst, but I'm still left wondering why DGraph never even considered SPARQL as it's specifically made for RDF graphs and is one of the very few mature query languages that can uniformly query graphs, relational data, XML, and JSON. Due to it's strict predicate namespace, cross origin queries are a piece of cake in SPARQL and thus it's pretty useful in complex system integration. At least you don't have the foreign entity mess you have to deal with in an Apollo federation.

However, I haven't seen anything about SPARQL so am I correct to assume that DGraph isn't going into that direction?

iluminae commented 4 years ago

hey guys - excellent work so far.

I would like to say that rudimentary data isolation is an absolute must for me to start using dgraph. For my use case, I would currently have to spin up a dgraph instance for every customer - which is not possible operationally. Customers will have conflicting schemas, which is not something an ACL can fix. Call this multi-tenancy if you wish - but I am actually not interested in ACLs. I need data isolation as far as other databases give me (postgresql, mysql, elasticsearch, etc. - all just have another directory on disk segmented by "database"). If each GQL call to dgraph selected one and only one "database", represented by it's own directory on disk, without any cross-talk, it would fit my need exactly.

larvinloy commented 4 years ago

@marvin-hansen What's intuitive is subjective. From my experience of using Neptune, I'm yet to see a query language as powerful as Gremlin. Two of the big features that I miss in every other graph query language are the ability to set query timeout on individual hops (inside the same query), and the ability to do recursive queries until a certain condition is met (i.e. without having to specify depth).

SparkQL might feel cleaner to a lot of folks because of it's similarity with SQL, but I'm yet to see a language for Graph Dbs that is as rich and mature as Gremlin.

marvin-hansen commented 4 years ago

@manishrjain @shekarm @MichelDiz

Please support CSV data import in Dgraph.

Details in ticket https://github.com/dgraph-io/dgraph/issues/4920

fpattyn commented 4 years ago

Can you reconsider adding multi-tenancy to the open source distribution? Being able to define different graphs in one database helps to solve the 'provenance' issue when integrating data from different sources. Every source adds data to a separate graph. It's a cool feature to be able to show where each data source contributed to a the complete knowledge graph.

ganisback commented 4 years ago

If does not support multi-tenancy in open source distribution, I have to back to janusgraph. I think it's a basic feature for a graph database.

willem520 commented 4 years ago

Hi,I want to know when the multi-tenancy will be supported in open source distribution. it is really important to me.I have used in product. single-tenancy. it means that I have to allocate server resources to each business. if I have 100 business, it will took a lot of server resources

ganisback commented 4 years ago

@Willem520, from this roadmap, they do not plan support this feature in community edition, it will be included in enterprise edition.

Relaxe111 commented 4 years ago

Well this is not exactly that. According to comments written earlier, they will consider if it will make sense to open source multi-tenancy. So I think than to speculate around this issue, will be better to wait for an official announcement from Dgraph team. )

willem520 commented 4 years ago

@Willem520, from this roadmap, they do not plan support this feature in community edition, it will be included in enterprise edition.

I think they will reconsider this feature

emregency commented 4 years ago

Hey all,

grizzly-monkey commented 4 years ago

We were evaluating dgraph for our SaaS application. The planned feature that we would need is Multi-Tenancy Most of the DBs give this as part of OSS code and is a basic necessity these days. It would be great if you give it a thought to have this as part of open source road map.

Thanks Jeet

willem520 commented 4 years ago

it would be great for dgraph to support ingesting data from hive or hdfs and other similar big data stores. now, bulk loader or live loader only support local files

dihmeetree commented 4 years ago

Most interested in "Single predicate sharded across groups" out of all of the things planned! I think it would be great for scalability and performance :)

liveforeverx commented 4 years ago

Hi, @manishrjain !

I would like to join all other people here (at least 5) with a question about multitenancy.

I think, that almost everyone, who uses DGraph is interested or can profit with this feature or get a better user experience by using DGraph. Even, who just runs pet projects and not able to buy a full licence would be interested in this feature. Just not to run different Dgraph's instances for different pet-projects and mostly 2 Dgraph instances per pet project (for example, because dataset in dev and test is different and test is cleaned, so instead of running one Postgres, you need most probably to run multiple DGraphs per any project).

I know people, who simply do not take a database as a serious database without this feature (and it was most major no in adaptation on my previous company, as I remember and one of the important points in my current company) and it was one of the most oft complaints I heard from people on meetups.

It would be great, if you would think about making this feature accessible to everyone, it would be great, if it would be Open Source.

Just some examples: Neo4j offers it for free: https://neo4j.com/developer/manage-multiple-databases/ ArangoDB offers it for free: https://www.arangodb.com/docs/stable/data-modeling-databases-working-with.html OrinetDB offers it for free: http://orientdb.com/docs/last/OrientDB-REST.html#post---database

I personally do not know another database, which doesn't have this feature or one, which offers this feature in Enterprise edition.

But, even, if it would be a one-time payment, for an acceptable price for the sole developer for this feature, I would like to consider buying it even for personal use on my development machine to get a better UX from DGraph usage, as I know it from any other database I used before personally and professionally. There was an opinion that other databases offer it not for free, I don't know any other database, which has multiple databases, has an open-source version and doesn't offer it for free.

My personal example: at the moment, I reset DGraph by every switch between developing an adaptor for DGraph and my pet-project and after the switch, I refill my pet-project with data every time. My tests on the pet-project carefully designed to clean every rubbish they create by themself to avoid this problem and be able to use dev + test on the same database.

P.S. I gave talks in meetup about DGraph and Elixir, I'm the maintainer of the most advanced Elixir driver for Dgraph.

liveforeverx commented 4 years ago

@larvinloy Wouldn't it be better to suggest enhancements for GraphQL+- with not supported use cases and challenge the enhancements for GraphQL+- to support this cases?

I think, it would be great to add some features to DGraph, which makes Gremlin not a requirement for any new greenfield project, so that GraphQL+- just covers this.

sorenhoyer commented 4 years ago

dgraph looks dope, but if multiple databases / multi tenancy support is only planned for enterprise customers I think most people doing SaaS applications, at least startups will just look elsewhere (eg. ArangoDB) which could be a real shame and possibly a lot of lost revenue (eventually). I'm sure people will be willing to pay once they get a decent amount of customers on, so if you don't want to give it away for free, maybe make a limit of say 200, 500 or 1000 databases (1 for each customer/tenant) for the Community version. If you need more I'm sure you can also pay for it. I for one is in that situation right now. I'd definitely have started my first SaaS on dgraph if only you had planned this as a community feature, but in the end decided to go with Arango due to this :/

derpycoder commented 4 years ago

@manishrjain,

With my limited knowledge of the back-end, some fantasy requirements for the website I am making & a few pages of High-scalability blog: I present to you my humble wishlist.

  1. Rate limiting
  2. Archiving or making older data read-only (Read only data can be compressed with higher compression ratio: e.g. using Z Standard)
  3. Input validation for each predicate
  4. NATS Streaming as buffer
  5. Integration with anything from CNCF
  6. LZ4 for compression instead of Snappy for writable data, as it is faster than Z Standard in both read & write, and uses less CPU power.
  7. ApolloGraphQL Plugin to support GraphQL+-
  8. Subscriptions & Live Queries
mac2000 commented 4 years ago

Wondering if there is any chance to see apollo federation support in future? e.g. ability to add new services backed by dgraph into existing ecosystem with other services and beeing able to extend existing entities and so on.

From implementation perspective there is really few things need to be made: