Closed GwennyGit closed 1 year ago
Another point here would be to include annotation checking and see whether the biological qualifier is correct.
The functionality of the script polish_after_MP.py
was extended to include the function change_qualifiers
. This function can handle a list of a specified type and change the qualifier type as well as the biological/model qualifier type according to the user input. There is also the option to only change the qualifiers for a certain database prefix. Additionally, the script could be enhanced to handle multiple inputs for the qualifiers and certain prefix databases, if one would want that.
The function change_qualifiers
could also be a good addition to refineGEMs.
I think we should add change_qualifiers
and polish_annotations
to refineGEMs. MIRIAM compliance is desirable and since you already wrote the script it should be easy to integrate.
Transfer:
polish_annotations
with
improve_curies
improve_curie_per_entity
get_set_of_curies
add_new_curie_set
change_qualifiers
with
change_qualifier_per_entity
add_curie_set
Modify:
add_curie_set
vs. add_new_curie_set
??generate_Cv_term
vs the functions that are already implementedchange_qualifiers
to all entitiesadd_new_curie_set
so that the MIRIAM pattern is updatedresolve_improper_identifiers
? @GwennyGit should we add that or is that too specific? (for now I left it out)add_new_curie_set
: Iterates over a dictionary of database prefix and identifier key-value pairs and adds the complete new curies with the user-provided pattern specification to the model.
add_curie_set
: Iterates over a set of CURIEs already existing in the model and adds these to the newly specified CVTerm. (As for change_qualifiers
only the CVTerm is changed this was deemed enough. However, it could also be rewritten to adjust the pattern according to user input for the CVTerms where the qualifier types are changed)
The generate_cv_term
function is necessary to be able to provide a specific qualifier as well as biological/model qualifier type.
Regarding resolve_improper_identifiers
we could add it to refineGEMs, however, the table used as input was done by hand and if identifiers occur in the user's model but not in the table this table should be extended with these identifiers. The function was created due to warnings that were caused by ModelPolisher, hence, a better way would be to update the BIGG database or the annotateDB or to improve ModelPolisher. I think the main problem is the BIGG database. I have already written issues regarding these topics in the respective repositories.
Ok then we will leave out resolve_improper_identifiers
for now. It can be added later if desired.
Maybe rewriting the functions I wrote for CVTerms to use the generate_cvterm function might make the code more readable. But that is optional since everything works as is.
Regarding the integration of the two new functions into polish
some questions came up for me:
change_qualifiers
so that all entities are checked and changed?polish
function so that they are only used standalone if a user wants them? The branch is up-to-date and we can go in any direction.
In principle, I think it makes the most sense to use the new pattern. It is the current pattern that will work, and I do not know if the old pattern will be deprecated soon. For the ontologies (as I stated in the script) I only implemented the use of the new pattern as the old one did not seem to work for the URIs I checked.
I am not sure if we could check the existing pattern. I mean probably in the function where the dictionary is generated, but that would not account or check for the whole model but only for the current CVTerm that is handled. 🤔
I think integrating change_qualifiers
into polish
would only make sense with the proposed wrapper function. Otherwise, I would leave this function as standalone and maybe add it to the YAML file.
Using change_qualifiers
in polish
to check and change the qualifiers for the model makes sense and would enhance the module in my opinion. However, I would combine this function with the function add_new_curie_set
such that during the check the pattern is changed. In addition, I would keep each of the functions as standalone. What do you think?
Ok, I will write the wrapper so that both functions can be integrated into polish
. The way I added the functions to the module automatically allows the user to use them separately. They will not automatically write to the new model but rather return the changed model so that the user can write it themselves. I will check the polish script again - this script is aimed at inexperienced users that just want to polish their models and that do not care so much for specific functions. change_qualifiers
and polish_annotations
can accessed via refinegems.polish.change_qualifiers()
and refinegems.polish.polish_annotations()
that should be enough. We should also mention them in the documentation or a possible sample script to make them more accessible.
What kind of Qualifiers to we set for the different entities. For model it is obvious but for the gene product for example - do we use BIOLOGICAL_QUALIFIER
and BQM_IS
in the wrapper function? What about the reactions which have BQM_OCCURS_IN
since they are part of a pathway?
Maybe you can have a look at the way I implemented it now. polish_annotations and change_all_qualifiers are now part of the polish
fucntion. I only used BQM_IS
qualifiers at the moment-
I don't really see how I would add add_new_curie_set
seperately now that both functions are included. But maybe I am missing something and this will provoke unwanted behavior. I will test it on one of my models later to see what happens. You might see it directly since you implemented the other functions :D
I took a look and stated in #58 already that for the GeneProduct we would have to set it to isHomologTo
if the model was created directly from a lab strain. Additionally, if we use BIOLOGICAL_QUALIFIER
would it not be then BQB_IS
?
Regarding other entities, so for units and UnitDefinitions I used the MODEL_QUALIFIER
as this makes more sense if one looks at the description here: https://web.archive.org/web/20211204231945/http://co.mbine.org/specifications/qualifiers. There the qualifier would be BQM_IS
except for the Millimoles per gram (dry weight) per hour
as there is a pubmed annotation contained which has the qualifier bqmodel:isDescribedBy
.
To the issue with the BQB_OCCURS_IN
, are we able to check for reaction IDs to solve this?
add_curie_set
: Takes a set of complete MIRIAM CURIEs & adds those to a new CVTerm object
→ Was implemented for change_qualifiers
as only the qualifiers should be changed
add_new_curie_set
: Takes a dictionary with the database prefix as key and the identifiers belonging to that database as set, generates complete MIRIAM CURIEs from the dictionary with the user-specified pattern & adds those to a new CVTerm
→ Was implemented for polish_annotations
as only the MIRIAM CURIEs should be changed
We could change the behaviour as such that the add_new_curie_set
function is replaced with a generate_new_curie_set
function which gets a dictionary with prefixes mapping to the respective identifier set as well as the boolean new_pattern
. This function then generates a set of CURIEs from the provided dictionary with the provided pattern and returns it. Thus, the result of generate_new_curie_set
can be added to the model with the add_curie_set
function.
change_qualifier_per_entity
for reaction BQB_OCCURS_IN and unit definition BQB_IS_DESCRIBED_BY and keep those. change_all_qualifiers
and keep / change for gene product BQB_IS_HOMOLOG_TO Problems / Questions I ran into:
SBase
object. So I cannot check whether the entity is a Reaction or a UnitDefinition. I left it out for now and I am just checking whether the BiologicalQualifierType is 9 (BQB_OCCURS_IN) or 6 (BSB_IS_HOMOLOG_TO) and then I print to the screen that the CVTerm QualifierType is not changed.for i in range(len(cvterms)):
to iterate through the CVTerms did not work for me. I used for cvterm in cvterms:
- but that makes deleting the respective CVTerm harder. I just left it out at the moment. It seems to not have any negative impact on the Model I tested this with.Tested the script on all of my three models (https://github.com/draeger-lab/Shaemolyticus: ATCC29970
, JCSC1435
& IMITSC147
). Found several bugs and fixed these. The script seems to run now properly and can be integrated into dev
. I also improved some printouts.
Everything seems to be working. For future bugs or ideas we should open a new issue.
Feature A module that can transform all URIs to the same pattern or change the used pattern in a model for the whole model. The function
polish_annotations
from the scriptpolish_after_MP.py
already implements this functionality and could be integrated into refineGEMs as a module. Additional functionality like changing the qualifier types for specific CVTerms could be added as well.