Closed nanjiangshu closed 3 years ago
Hmm I'm not sure about this one.
It might be that between subsequent versions, new genes are added to Human-GEM. This doesn't happen that often though. Anyway, I'm wondering if it's not more responsible of us to do the filtering on-demand, at each release, rather than having a static dataset that needs to be filtered manually.
Hmm I'm not sure about this one.
It might be that between subsequent versions, new genes are added to Human-GEM. This doesn't happen that often though. Anyway, I'm wondering if it's not more responsible of us to do the filtering on-demand, at each release, rather than having a static dataset that needs to be filtered manually.
For filtering
do you mean filtering of the genes that are in the HPA data but do not exists in the Human-GEM model? It is done with this line in the parsing script. I don't think it is a big problem with the filtering. We update the hpaRna.tsv
when rna_tissue_hpa.tsv
is updated or Human-GEM.yml
is updated and I guess it won't happen that frequent. If we really want to automate it, we could add a version tag and integrate the formatting script to generate-data
.
For
filtering
do you mean filtering of the genes that are in the HPA data but do not exists in the Human-GEM model? It is done with this line in the parsing script. I don't think it is a big problem with the filtering. We update thehpaRna.tsv
whenrna_tissue_hpa.tsv
is updated orHuman-GEM.yml
is updated and I guess it won't happen that frequent. If we really want to automate it, we could add a version tag and integrate the formatting script togenerate-data
.
Sounds good to me. Could you then please update https://github.com/MetabolicAtlas/data-files/blob/main/DATA_OVERLAY.md with a section detailing this update procedure?
For
filtering
do you mean filtering of the genes that are in the HPA data but do not exists in the Human-GEM model? It is done with this line in the parsing script. I don't think it is a big problem with the filtering. We update thehpaRna.tsv
whenrna_tissue_hpa.tsv
is updated orHuman-GEM.yml
is updated and I guess it won't happen that frequent. If we really want to automate it, we could add a version tag and integrate the formatting script togenerate-data
.Sounds good to me. Could you then please update https://github.com/MetabolicAtlas/data-files/blob/main/DATA_OVERLAY.md with a section detailing this update procedure?
I've created a new issue for it.
This PR together with PR 18 and PR 699 closes #124
Since the HPA data for DataOverlay will be provided as the transcriptomics tsv file for Human-GEM, the parsing code in data-generation is no longer needed.