Open ChakshuGautam opened 1 year ago
Hello @ChakshuGautam , I am interested in contributing to this project. Could you please clarify the feature Alternate Model Evaluation? Does it mean trying out the models used in WikiSQL etc and reporting the results?
Yes @HemanthSai7. But not with their data - it needs to be a complete cycle of training the model for a domain and seeing the results. Can you start with creating test data first? Also we don't need to do for all, just some promising ones. I am looking at 3 max based on literature review - the ones that have been evolved the most.
Okay, I'll start by reading the wikiSQL paper. Can you please elaborate more on the test data?
Hey @ChakshuGautam ! I am looking forward to contributing in this project, it would be helpful if you could guide me in the initial phase, that would be helpful.
Hey guys, let me break this down further and share it by EoD today.
@ChakshuGautam waiting for the information and all the details
Hey @ChakshuGautam !
I'm eager to contribute to the Text2SQL project and would appreciate your guidance in the initial phase. My experience in Python, SQL, Django, and SQLAlchemy, along with my research work, will be valuable assets. Looking forward to your assistance.
hey @ChakshuGautam I am looking forward to contributing in this project, it would be helpful if you could guide me in the initial phase, that would be helpful.
Hey @rishabhv471 , You can start by setting the project up in a Gitpod environment or in your local. For Gitpod you can follow my video. For local setup you can you can follow the readme. If you face any issues or if you have any question you can ping in the discord channel or you ping me. Will be happy to help. Looking forward to your contribution.
Hey @ChakshuGautam , i have gone through the requirements that will be implemented in our coming mentorship program ,
basically i am dividing my solution approach into two parts : -
-Data Exploration for Viable Filters:
-Test Cases/Benchmarking: To test the effectiveness of your model, you can leverage the WikiSQL dataset available on Hugging Face. i have uploaded my more detailed approch in unstop portal , the github repo link was there
Token Count Monitoring: Keep track of the token count in your input text using OpenAI's tiktoken library or similar tools. This helps to estimate the token usage and manage it effectively.
Experiment with Model Parameters: Adjust the max_tokens parameter in the API call to set a specific token limit. By setting a lower value, we can ensure our requests stay within the desired token budget.
`import openai import tiktoken
def optimize_tokens(text, max_tokens):
initial_tokens = tiktoken.count(text)
if initial_tokens <= max_tokens:
# If the text is already within the token limit, return it as-is
return text
# Shorten the text while preserving the meaning
shortened_text = text[:tiktoken.find(text, max_tokens - 3)] + "..."
# Adjust the shortened text to account for complete tokens
token_diff = initial_tokens - tiktoken.count(shortened_text)
shortened_text = text[:tiktoken.find(text, max_tokens - 3 + token_diff)] + "..."
return shortened_text
openai.api_key = "YOUR_API_KEY"
input_text = """ This is a very long text that exceeds the token limit of the language model. We need to optimize the tokens to fit within the maximum allowed tokens. """
max_token_limit = 100
optimized_text = optimize_tokens(input_text, max_token_limit) print("Optimized Text:", optimized_text)
response = openai.Completion.create( engine="text-davinci-003", prompt=optimized_text, max_tokens=max_token_limit )
print("Response:", response.choices[0].text) ` I am looking forward to contributing more in this project, i will be blessed if you could guide me in the initial phase, that would be helpful.
@ChakshuGautam sir submitted the proposal, looking forward to working under you and contributing to this wonderful project
Hey guys I am deleting the non-solution related messaged from here.
Project Details
Text2SQL is an application that allows users to interact with their data using natural language queries. Currently, it only supports SQL-based querying but the implementation is not limited to that. Text2SQL provides APIs to generate the appropriate query (SQL or otherwise) and return the data you need.
Features to be implemented
Token Optimization
Improve token usage with OpenAI
Alternate Models Evaluation
Models to be evaluated
Domian Mapping to Schema
Test Cases/Benchmarking
Add public test cases to test out the current model.
Learning Path
Complexity
Complex
Skills Required
Python, Knowledge of HuggingFace Transformers, NLP, SQL, Databases.
Name of Mentors:
@ChakshuGautam
Project size
8 Weeks
Product Set Up
See the setup here
Acceptance Criteria
C4GT
This issue is nominated for Code for GovTech (C4GT) 2023 edition. C4GT is India's first annual coding program to create a community that can build and contribute to global Digital Public Goods. If you want to use Open Source GovTech to create impact, then this is the opportunity for you! More about C4GT here: https://codeforgovtech.in/