LiuzLab / AI_MARRVEL

AI-MARRVEL (AIM) is an AI system for rare genetic disorder diagnosis
GNU General Public License v3.0
8 stars 6 forks source link

HPO database update #93

Closed SpicyChicken6 closed 1 month ago

SpicyChicken6 commented 2 months ago

Please refer to the issue #92


Description

This PR adds two files into the utils folder:

  1. hpo_update.r
  2. parseGeneMap2_output.py (modified based on OMIM-provided parser: https://github.com/OMIM-org/genemap2-parser/tree/master)

To download and generate the parsed/organized data files, run hpo_update.r, which will generate 3 files: hpo.obo, HPO_OMIM.tsv, and genemap2_v2022.rds. Old data files can be simply replaced by them.

Compared to the 2022 data version, 2024 data provides ~2000+ more genes associated phenotype terms from OMIM

Note

In the data dependency folder, the genemap2 data is named as genemap2_v2022.rds . To avoid any inconsistency issue, the new data file name is set the same even though it is from 2024 instead of 2022.


Test

I have run one AIM nextflow case with the updated data files, which generates consistent results with no issue.

Old run: image New run: image


SpicyChicken6 commented 2 months ago

It looks good to me. But it will be effective after updating our s3, and we may encourage users to sync their aim data dependencies.

Despite it uses the misleading name of v2022 for the data made in 2024, I think the current approach is the better way to minimize possibility of errors for those who have old version of data dependencies than changing the name.

Sounds good, thanks Jaeyeon

hyunhwan-bcm commented 1 month ago

@SpicyChicken6 I see some difference between two screenshots - can you visualize each affected column between before and after?

hyunhwan-bcm commented 1 month ago

@jylee-bcm can you test this is working?

jylee-bcm commented 1 month ago

@hyunhwan-bcm @SpicyChicken6 Sorry for delay. I just tested, and confirmed it generates the files without any issue.