Open Anshumanv28 opened 2 months ago
Let me know your thoughts on this. It can start from a simple script in the repo for any contributor to use, but might serve more usecases in future with improvements.
Not clear on what the script would look like. Could you break down the steps please?
Add your terms to a txt file say myterms.txt run the script The script matches the titles of all the existing JSON files in content/terms (stripping and casing the file titles), with the terms in the file myterms.txt A simple approach but open to improvements. makes checking the existence multiple terms at once possible.
Ok. And you're thinking of adding it as a utility file in the project?
@Anshumanv36 but isn't the search feature already doing it? Correct me if I'm wrong with what you are trying to do.
@RayMathew yes exactly. @Sudharshaun the keyword "Bulk", I dropped this idea earlier thinking the same but I am concerned about the improvements for contributors who want to push more words at once. Also the script can be improved to provide more functionality, could help with data acquisition with web scraping and LLMs in future.
example usecase: Lets suppose you have about 20-30 terms you want to contribute, do you go to the website and check each and every word if it already exists in the database? Just automate it also we can have a script to make JSON files with content generated by LLMs in the required format which would help.
@Buzzpy let me know if this would be useful, as you know Hacktoberfest is approaching, contributers could find this useful
example usecase: Lets suppose you have about 20-30 terms you want to contribute, do you go to the website and check each and every word if it already exists in the database? Just automate it also we can have a script to make JSON files with content generated by LLMs in the required format which would help.
@Anshumanv36 wouldn't this be going back to the cost bottleneck again for the LLM implementation which is mentioned in #95?
Yes usecase 2(LLM response to JSON file) would present the quality issue related to using LLMs but thats assuming the contributers choose to not review their entries and just push low quality content, and even if they do we are not merging any entries to master yet with review first. The bulk search usecase is still without bottlnecks If we can define some sort of flow to verify the quality of content then the second LLM usecase will be more suitable but as @RayMathew mentioned earlier this is gonna sit as tools in utils dir for voluntary use by contributors who want to add, till we find better use for it.
Hello everyone!
Thanks for all the input and ideas! The concept of bulk-checking terms sounds great and could really streamline contributions for everyone. However, since this project is still in its early stages and there are some bottlenecks to consider (like the LLM-related challenges, and cost), I’ll temporarily pause the bulk-checking feature for now.
That said, the idea is definitely valuable! In the meantime, contributors can still add bulk terms one by one using the existing methods. And if anyone finds it tricky or needs help, they can always open an issue and ask for help—I’ll soon update the issue template with info on how to request help.
Thanks again for pushing forward such helpful ideas, and I'm excited to keep evolving this together! Cheers! 🥂
And feel free to continue the chat if needed, I won't close the issue for now.
Proposal implementing a feature that allows for bulk checking of terms. This script would enable developers to input multiple terms and check their presence in the encyclopedia all at once.
The goal is to enhance the developer experience for those looking to contribute by simplifying the process of verifying multiple terms. Instead of checking each term individually, contributors could use this script to quickly see which terms are already present and which are not.
This feature aims to streamline the contribution process, making it easier and more efficient for developers to suggest and add multiple terms to the encyclopedia at once.