Closed anshgoyalevil closed 1 year ago
@anshgoyalevil What do you think could be the best approach here? Are you going to limit the number to characters by -> a) making sure that if a request field exceeds the maximum length, the request should be rejected as a whole. b) If the characters exceed the maximum length only characters till the maximum will be stored ?
We will need to discuss this because this problem is not only limited to Post
or Event
mutation. This problem extends to each and every mutation where we are taking a string as an input for eg in signUp security bug I was working on, I noticed I could technically write a very large length of a string.
Could you open a parent issue for the same where child issues can be assigned to others?
@anshgoyalevil What do you think could be the best approach here? Are you going to limit the number to characters by -> a) making sure that if a request field exceeds the maximum length, the request should be rejected as a whole. b) If the characters exceed the maximum length only characters till the maximum will be stored ?
I would be limiting the number of characters approach, since storing the characters till the maximum length would create data discrepancy.
We will need to discuss this because this problem is not only limited to
Post
orEvent
mutation. This problem extends to each and every mutation where we are taking a string as an input for eg in signUp security bug I was working on, I noticed I could technically write a very large length of a string.Could you open a parent issue for the same where child issues can be assigned to others?
Yes, I already mentioned that there is no validation in any of the form field, including Organization, Post, Event, User Creation forms.
I don't think, this issue would need child issues to work on, since I am thinking of a strategy to apply the logic to the whole input base, with a maximum string length of 255
characters. There are other minor conditional checks, which need to be added to the mutation resolvers.
If I would need any help from you, or if it seemed like the codebase is gonna change a lot, I will surely ask out to divide it into child issues.
@anshgoyalevil sounds good to me
@anshgoyalevil @kb-0311 here's some stuff I wrote few months ago. It'll provide you some more insight. https://docs.google.com/document/d/1BgD354MQukK6pCJ9Xgfpzx5L1o-qq4m19odgfXXJRkU/edit?usp=sharing
@anshgoyalevil @kb-0311 here's some stuff I wrote few months ago. It'll provide you some more insight. https://docs.google.com/document/d/1BgD354MQukK6pCJ9Xgfpzx5L1o-qq4m19odgfXXJRkU/edit?usp=sharing
Thankyou @xoldyckk, will read it in depth.
@xoldyckk It seems you had mentioned about the getSort() helper I had sent a PR for a long time back. Now that it has been implemented, I guess we can remove that as a problem in your watchlist. Along with several others.
Thanks for this btw really helps with compartmentalizing the problems.
@kb-0311 getSort functionality was within the resolvers. I had seperated it out into functions because it was cluttering the resolver logic for no reason.
Btw current implementation of getSort is not good enough as it is not type safe.
Talawa-api also used graphQL in an anti pattern way. There should not be so many query fields to resolve connection objects. Those connection objects should be nested as relations for the corresponding parent objects.
Everything should be a single graph. To save on useless calculations for parent objects when you only want to fetch a deeply nested child object you can introduce the node interface pattern.
@xoldyckk So do you mean instead of just using mongoose's populate function to get connected nodes, what the API should be doing is fetching the connected object or rather the population fields of the child node, necessary for a given logic of the parent resolver in a nested manner from the database in that given resolver?
How to go about making the current getSort() function type safe?
@kb-0311 Yes that's how graphQL is supposed to work. If you return everything within the root resolver you're just wasting resources.
Example:-
User {
userName
posts
followers
followees
}
With the current approach if the client asks for userName field your resolver also queries the database to get all the posts, followers, followees which are expensive fields to query. If the client needs just the userName field which is present on the base object why would you also fetch the other expensive fields? If you had nested resolvers for those expensive fields graphql would only execute those resolvers when the client explicitly asks for those expensive fields.
@kb-0311 getSort can be made type safe according to the sorting parameters mongoose accepts. Though mongoose itself is not type safe so it doesn't really matter. lol. It will accept anything you pass to it as the sort object and only throw errors at runtime which goes against the philosophy of typescript which is to catch errors at compile time.
@xoldyckk Got it :+1: I have been thinking about that too, why are we using populate function so much, eg:
This does seem that refactoring the codebase into nested queries could be a really good idea.
So basically as mongoose's sort field is itself not type-safe there is no way to consistently make sure the sorting parameter itself is type safe? lol.
@kb-0311 Returning everything in root resolver is a graphql anti-pattern but that's exactly what mongodb is good at. It sucks at querying relational data and is extremely slow. Because it is designed purely to store related data within the same document the parent is. Like just embed the data inside it. This is unlike relational databases which have seperate tables for each data object which are connected through a foreign key.
But if you embed data you'll have problems later on if you want to perform operations on that embedded data and if that embedded data is related to other objects. You can also have duplicate objects in mongodb. Same object with same object data embedded in different documents.
You see where I'm getting at. Managing it is a headache and it's not designed to work with relational data. It's good for just dumping useless data which needs to feeded to some Artifical Intelligence system or something. It is not ACID compliant. You think you deleted some object? No it is embedded into 20 other objects as well. And not only one level embedding. You can have many levels of nesting and embedding.
Add to all this the fact that you have no type safety in code. You don't know what object references what, what's embedded where, how many places you need to mutate it if you want to etc. etc.
Ohhhkkk I got what you are trying to say.
So basically because we are using MongoDB there are performance downsides to it. Hence that explains why we need to use forEach loop in our resolver to delete the related documents when deleting a parent document.
So what do you suggest as a possible solution?
@kb-0311 mysql or postgres. For a hosted service PlanetScale is the best right now. For ORM there's prisma. Best in the market designed from the ground up for typescript first experience.
I would recommend postgres too,
While I was learning graphql I was using postgres too https://github.com/kb-0311/civilised-forum-and-notices. lol.
But I highly doubt we would be migrating the database anytime soon though
I myself would highly advocate for Postgres and Prisma. I have worked with both of them before, and it's been one of my favourite backend development experience till data. Another benefit of using relational databases, you get detailed table structures and relation diagrams that help beginners to understand the data and how it works. Using MongoDB, you have to heavily rely on your intuition for the same.
@palisadoes there's always a room for hybrid approach provided we are ready to handle the issues related to complexity and scalability.
To implement the same :
We can try this out for a small isolated part of our system and try to figure out how things can work out.
@palisadoes Please assign this issue to me.
@palisadoes @xoldyckk what do you think about this??
@palisadoes @aviraldevv @anshgoyalevil @EshaanAgg @kb-0311
There's not a single data model in talawa-api which is non-relational. Non-relational data doesn't exist in applications almost 99% of the times.
First let me tell you this. MongoDB is perfectly fine unless your app is scaling to millions and you need fast operations or need to resolve extremely complex relations along with complex sorting and filtering logic. It's the package mongoose which is bad because it has poor integration with typescript. At the very least that package should be replaced with prisma.
Prisma has official support for mongoDB and it is made from the ground up to cater towards typescript and good developer experience. Also the way many models are mapped in mongoose models is bad design-wise for relational data and type safety. They should be mapped in a normalized way like relational database models are. There should be no embedding of documents. Also there should be a strict enforcement on types and expected values of the model fields according to the business logic.
Now if you want to know about integrating sql databases read below:-
The reason I and many other people are recommending the usage of a relational database is because of the performance, safety, reliability and ACID capabilites they provide. Scalability is not an issue because it's managed by the DBaaS the user chooses.
There are no use cases where mongoDB would provide value over a sql database except for the native gridFs implementation they provide for file storage. Though that also only works with hosting providers that do support gridFs functionality in their own mongoDB implementation. And also its implementation to store files is completely different from object storage providers like S3, google cloud storage etc. Same is true for the functional logic required for sending/receiving files through it.
If we go with relational databases like mySql or postgres we'll have to use a seperate object storage provider like S3, google cloud storage etc., for file storage. This is not an anti-pattern. This is the industry standard and convention for storing static files.
Also there's no scenario where the person using the talawa packages can get away without providing their own configuration for database and storage provider. We're not hosting it for them or anything. Talawa-api should be made integrable with different database providers and storage solutions(at least the ones that are popular). This functionality would be available through plugins.
Mostly plugins would only be required to integrate different storage providers. Because the database is usually just integrated through a DATABASE_URL
environment variable and talawa-api would't care where and how the database is hosted. Though some good recommendations for good database providers can be provided in talawa-docs.
I figured that would be the answer. If we were to go all RDBMS should that be a 2023 or 2024 project?
@palisadoes I guess we make a separate branch for all talawa-api related work that should be done for GSOC 2023. The mobile and admin will continue to use the older version of talawa-api which will be on develop branch.
I'll think about it and tell you later.
@palisadoes @xoldyckk @anshgoyalevil @kb-0311 @aviraldevv
My two cents
:
We must shift to a RDBMS as soon as possible, because of the follwing reasons:-
Switching to Prisma is a big overhall, but in the process we have to explicitly make the relations and data structures. While designing the same, we can work on parallely updating the documentation about these models and their constraints of the Talawa-Docs, which is something that we severely lack.
Instead of using a separate branch, I personally would recommend creating a new repository itself. The reasons for the same include:
talawa-api
would act as the guiding source-of-truth for the developers to work on.Working with file storage solutions and using different plugins to support different providers should be the last concern of ours.
@DMills27 @JamaicanFriedChicken What are your thoughts?
@EshaanAgg I'm not convinced by your arguments that an SQL database would be absolutely necessary here. The biggest issue with introducing a SQL database would be the overhead in terms of costs that arise from vertical scaling that come with SQL databases.
• Regarding your first point: while there are a number of schema that suggest a relational nature, it could be the case that these schema could be "trimmed" down or refactored to make then "less relational". What I mean is that in all the schema we have not all of the relations are used to the best effect, for instance Facebook uses a NoSQL database (after switching from an SQL database in its formative years) to manage its social media site and you could argue that many things in that site are "highly relational in nature".
• Regarding your second point: You mention the "enforcing of constraints and maintaining shape of the data" but are you aware that MongoDB is actually ACID compliant for multi-document transactions since MongoDB 4.0 was released in 2021? (https://www.mongodb.com/transactions).
• Regarding your third point: you state that "NoSQL databases have been designed for applications with high writes, and less reads" but this isn't actually true. In general, NoSQL databases have faster read and write speeds than their NoSQL counterparts. See https://www.cs.rochester.edu/courses/261/fall2017/termpaper/submissions/06/Paper.pdf
As it relates to the rest of your comments, I think perhaps introducing a cache such as Redis many allay the concerns brought up here. I've never used Prisma but heard good things about it so I can't speak at length about it. One could separate the "relational" aspects of the GraphQL schema and then interact with Redis to produce the desired results in an efficient manner.
@palisadoes @xoldyckk @kb-0311 @aviraldevv @EshaanAgg @DMills27
My thoughts on this:
Sequelize
, if we want to use a RDBMS, since its syntax is very similar to Mongoose, and it also supports TypeScript support.Prisma
as suggested by others above.Redis
for caching as suggested by @DMills27@palisadoes @DMills27 @xoldyckk @anshgoyalevil @aviraldevv
I don't think we would be migrating databases any time soon nor introducing Prisma soon. However, that could be an idea for GSoC '24. (maybe)
Regarding the computational cost of queries->
redis
. I would suggest batching large queries as well as caching them.To summarize, if we are delaying the migration of database or Prisma ORM , then maybe this year we can implement this.
Please remember that those on this thread are focusing on Talawa-API. We are precarious in Talawa-Admin and lots of work needs to be done. Doing simultaneous major redesigns of two interdependent repos will not be wise.
If you have an hour to spare, I strongly suggest you take a look this video to:
I urge you to find a way to solve this with:
We still have 2024.
@palisadoes implementing Dataloader would not require much schema modification. The API routes and resolvers will be similar only the queries will be cached and in case of large queries, they will be batched as well.
I agree we should not completely redesign the API project this year and leave it as an upcoming task.
I had seen the call recording, could not join because it was very early in the morning, I get why redesigning has to be put on hold for a while. There are also a lot of inconsistencies with the client(talawa-admin , talawa mobile app) query/mutations with those that exist in the API.
@anshgoyalevil has been working with admin portal the last time we talked maybe he can provide some more input on how to mitigate inconsistencies in the admin portal And someone who is well familiar with talawa mobile app will also need to be brought into this conversation..
@kb-0311 Totally agree with you. There are a lot of things which need to be modified/added to the Talawa-API. There are many things in the Talawa-Admin project which are acting like a placeholder. For example, Contributors
is present in Menu of Admin Dashboard but is not implemented in the API, Block/Unblock System works very inconsistent.
Talawa API project needs a lot of work, since, there are certain Mutation/Query resolvers which were supposed to be there according the Talawa Admin project. One such issue, I resolved 2-3 days back was also related to this, as there was edit post button inside the Admin Panel, but, no mutation resolver was present in the API.
While, according to the video @palisadoes shared, this is also a good thing, since, this gives us an idea of what needs to be implemented in the API after looking at the dummy links in admin panel.
I mostly worked on the Talawa API so far, and observed that there are many things which are missing. One such feature is Spam Protection, which was reverted back last year due to TypeScript migration issues. It is now included in the 2023 idea list to include it as Service as a Plugin model.
I hope, many such issues will be resolved this year, and the other issues such as database migration can be discussed over a meet for the next year, i.e., 2024.
We also have big issues on the API with outdated and deprecated dependencies. The Apollo server update is a big one.
This will need to be added to our GSoC ideas.
Describe the bug There are no input validations in the resolvers, thus, the following validations are being failed:
hello
, with no mixed characters.Create Organization
,Create Post
,Create Event
have no input validations in any of their field, thus, they can be buffer overflowed, as described in issue https://github.com/PalisadoesFoundation/talawa-admin/issues/463To Reproduce Steps to reproduce the behavior:
cool
Expected behavior There must be errors thrown for such a behavior.
Actual behavior No error thrown.
Screenshots