Closed cheng-yu-zhang closed 2 years ago
@cheng-yu-zhang For each pull request, please summarize the detailed work that you have done so that it will be easier for other people to review it.
@cheng-yu-zhang For each pull request, please summarize the detailed work that you have done so that it will be easier for other people to review it.
@hongzhonglu I haved added more details into the comments.
Hi @cheng-yu-zhang, Thanks for this update! Nice work!
The growth test for the updated model basically remains the same with model in the devel branch. The accuracy for gene essential test also remains the same (0.89). However, two genes: YKR072C
and YOR054C
are now false negative (experimental_viable, model_inviable for deletion), please double check reactions associated to these two genes.
You mentioned you added 7 new genes, but according to the README file, the gene number has been changed from 1150 to 1161. Please check this.
It would be better to have a reference or a database reference for every change so that we can trace back to the annotation. This could either be an extra column of "databasenewGPR.tsv" or summaries as a table here (see below for example). It would facilitate the transparency of the model curation. @edkerk @hongzhonglu, what do you think?
For example:
Genes | Related reactions | Reference |
---|---|---|
YGL119W | fill this | fill this |
YGR147C | fill this | fill this |
Genes | Related reactions | Reference |
---|---|---|
YPR165W | ||
YFR049W | ||
YOR253W | ||
YGR038W | ||
YLR350W | ||
YBR128C | ||
YLR211C | ||
YLR360W | ||
YPL120W | ||
YFR021W | ||
YNL054W | ||
YGR106C | ||
YPR170W-B |
Genes | Related reactions | Reference |
---|---|---|
gene |
@feiranl There should indeed be an explanation of why these curations were performed. The PR text mentions that these were manually curated by looking at different databases, but which database is then suggesting which change? Do the databases agree? Is there a conflict? Also some genes are removed, how confident are we of this?
I have rebased this PR onto the latest develop
branch, so that the model files can be generated. I also refactored the code to use only RAVEN functions, following #301.
Instead of modifying existing files that were used for previous curations (databasenewGPR.tsv
), it is better to make a dedicated file for this particular curation. See for instance #300 and #304, where separate folders with those files are made (here just 1 file would be sufficient).
@edkerk Instead of making a new file "DBnewRxnsGenes.tsv“, which detailed the new genes, could I add another file, maybe named "databasenewGPR_proof.tsv", to explain why these curations were performed? For example: | rxnID_yeast_model | genes_yeast_model | final_GPR | reference |
---|---|---|---|---|
r_0005 | YGR032W or YMR306W | YMR306W | web link or paper |
@cheng-yu-zhang
refseq
column of the gene metadata: not the nucleotide sequence, but a nucleotide NCBI identifier.code/modelTests/essentialGenes.m
Could also run the Growth Tests? This normally will run successfully, but just to make sure that we have a functional model? @hongzhonglu @edkerk @cheng-yu-zhang I think maybe it is time to have some more tests after each update to ensure the quality. Now we have essentialGenes
and growth
, but maybe we can have a separate flux check which can be extracted from C13 data? In that case, we know that we are making the flux prediction better or at least not worse. What do you think?
Could also run the Growth Tests? This normally will run successfully, but just to make sure that we have a functional model? @hongzhonglu @edkerk @cheng-yu-zhang I think maybe it is time to have some more tests after each update to ensure the quality. Now we have
essentialGenes
andgrowth
, but maybe we can have a separate flux check which can be extracted from C13 data? In that case, we know that we are making the flux prediction better or at least not worse. What do you think?
It is very nice suggestion. More test will make sure the model prediction quality is increased consistently. @cheng-yu-zhang @feiranl
Main improvements in this PR:
Manually check all 209 complex annotations in yeast8.5 based on uniport, SGD and complex portal. I applied "addDBNewGeneAnnotation.m" to correct 45 complex annotations which are wrong or incomplete.
The explanation is in file "explanation.docx"Explanation
Yeast_complex_portal_2022.tsv is the latest complex information downloaded from complex portal. This file and complex portal website are the most import reference, and uniprot and SGD is for supplement.
I hereby confirm that I have:
develop
as a target branch (top left drop-down menu)