Closed SpicyChicken6 closed 1 month ago
It looks good to me. But it will be effective after updating our s3, and we may encourage users to sync their aim data dependencies.
Despite it uses the misleading name of v2022 for the data made in 2024, I think the current approach is the better way to minimize possibility of errors for those who have old version of data dependencies than changing the name.
Sounds good, thanks Jaeyeon
@SpicyChicken6 I see some difference between two screenshots - can you visualize each affected column between before and after?
@jylee-bcm can you test this is working?
@hyunhwan-bcm @SpicyChicken6 Sorry for delay. I just tested, and confirmed it generates the files without any issue.
Please refer to the issue #92
Description
This PR adds two files into the utils folder:
To download and generate the parsed/organized data files, run hpo_update.r, which will generate 3 files: hpo.obo, HPO_OMIM.tsv, and genemap2_v2022.rds. Old data files can be simply replaced by them.
Compared to the 2022 data version, 2024 data provides ~2000+ more genes associated phenotype terms from OMIM
Note
In the data dependency folder, the genemap2 data is named as genemap2_v2022.rds . To avoid any inconsistency issue, the new data file name is set the same even though it is from 2024 instead of 2022.
Test
I have run one AIM nextflow case with the updated data files, which generates consistent results with no issue.
Old run: New run: