Closed hectoritr closed 2 years ago
ISSUES REGARDING THE PHRASES JSON:
Look at this example of a phrase JSON:
{"frase":"Hola Buen día tengo castillo ","frecuencia":3,"complejidad":{"valor":0,"pictos componentes":[{"id":377,"esSugerencia":false},{"id":379,"horario":["MANANA"],"esSugerencia":false,"hora":["MANANA"]},{"id":49,"esSugerencia":false},{"id":945324633}]},"fecha":[1637327873525,1637327994767,1637329129227],"locale":"es","id":0}
some things we have to decide are:
Span of keys and tags the app is using at the moment:
'key': {'TAGS'}
'hora': {'MANANA', 'MEDIODIA', 'TARDE', 'NOCHE'} 'edad': {'ADULTO', 'JOVEN', 'NINO'} 'sexo': {'FEMENINO', 'MASCULINO', 'BINARIO', 'FLUIDO', 'BINARIO'} 'ubicacion': {'ESTADIO', 'PARQUE',... //we are not using it for now
Possible Sources for Training Miguel's Algorithm:
EDIT: maybe it's best to work with medical questions datasets, which will actually include what our users would say in a medical/hospital context. Then the last dataset from the list before (https://github.com/curai/medical-question-pair-dataset) might be the best to try first, and we should also add others we can find, such as:
this came up after a really quick search in google, as examples, a better research migh give better results even.
Possible Sources for Training Miguel's Algorithm:
Whis might be useful: https://convokit.cornell.edu/documentation/datasets.html
Scientific Dataset: https://www.kaggle.com/datasets/Cornell-University/arxiv/code
The full dataset is REALLY large (1.1TB and growing), but we can download the metadata which have all titles, abstract, authors, categories, etc. With it we can select categories for each model and train with the abstracts or download some of the papers.
Download correct model on Login. When the user starts the app and select the Gender and DoB. We should store those value to be use by the prediction algorithm.
The options currently are
Gender
Age (calculate the right TAG based on the DoB
These values should be stored as profile info of the user and used in the prediction.
Then based on the gender you have to download the right JSON dataset.
@asimjawad is this done? Download the right model accordint to the user?
@hectoritr we did not do this. add the required api here and I will be on it.
@lopezjuanma96 add them here.
This was resolved on #101
reopening this issue, because work on some parts was not done.
@asimjawad here is what I found so far
This was on the default database
This was on the testing database
@hectoritr can you explain this a further. I will see it in the morning.
I tried loggin as a female and changed the default database and not the testing one, I don't know which one are you using in dev. Just that. The main thing is that even though it asked me to choose the the gender it downloaded the male version.
@hectoritr we are using default one.
and as I told you that, the jsons are only loaded when a user will create a new account... and they will be choosing their gender. At that time we will upload and save the Json according to their gender.
Describe the solution you'd like When a new user is logging in, we should provide a trained JSON model for their predictions based on their preferred gender and age.
We are covering several types of genders:
And 3 age types:
So, we would need to create 15 different models according to the possible combinations. @gonojuarez will fetch the last data from the database.
@lopezjuanma96 will train and provide the different models. We can use Miguel's algorithm but also use the current metadata on the JSON file to improve the model, like the user's age and time of day of the sentence to TAG them.
@asimjawad will do the flutter implementation.
Comment below when your work is done.
Additional context Add any other context.
Action Plan Add an Action Plan with Checkboxes on key things you have to achieve to complete this task.