genome-nexus / genome-nexus-importer

Import data into MongoDB for use by https://github.com/genome-nexus/genome-nexus/
MIT License
4 stars 16 forks source link

add mutation assessor data to mongo #53

Closed nr23730 closed 2 years ago

nr23730 commented 2 years ago

@inodb As discussed on slack, the mutationassessor service is offline. Anyways it's still possible to just load the dataset into the mongodb and provide annotation using that way. However, to prevent routing requests into nowhere I set the mutationAssessor.url parameter in GN to http://127.0.0.1/VARIANT&frm=json&fts=input,rgaa,rgvt,msa,pdb,F_impact,F_score,vc_score,vs_score,info,var,gene,uprot,rsprot,gaps,msa_height,chr,rs_pos,rs_res,up_pos,up_res,cnt_cosmic,cnt_snps",

nr23730 commented 2 years ago

Thanks for reviewing. According to your comment I'll use the SPECIES and REF_ENSEMBL_VERSION to differ between those. It might make sense to provide different images for different species/reference genomes then. I could also come up with a draft for this.

nr23730 commented 2 years ago

@inodb Done! Mutation assessor will now only be imported when grch37 is selected.

As this is already relevant when building the docker image I modified this too: Now three docker images will be created for grch37: latest, grch37-latest, grch37-0.xy For grch38: grch38-latest, grch38-0.xy For grcm38: grcm38-latest, grcm38-0.xy

leexgh commented 2 years ago

@nr23730 Thanks for fixing this! Do you have a backend pull request to use new mutation assessor?

nr23730 commented 2 years ago

Hi @leexgh!

I did not change anything in the backend at all. The data format is compatible with the one that is stored in the database anyway. So this should work out of the box, but will result in timeouts cause it still tries to access the REST API. For my Instance I just set the url to http://127.0.0.1/VARIANT&frm=json&fts=input,rgaa,rgvt,msa,pdb,F_impact,F_score,vc_score,vs_score,info,var,gene,uprot,rsprot,gaps,msa_height,chr,rs_pos,rs_res,up_pos,up_res,cnt_cosmic,cnt_snps, so that the request fails silently.

leexgh commented 2 years ago

Hi @nr23730 , I tried a few times to run your script but failed, probably something wrong with my computer, did you ever see this error before?

2022-05-26T01:41:20.907+0000 E STORAGE [thread46] WiredTiger error (28) [1653529280:903560][48:0x7f1f08fe1700], file:collection-0--7571207753880960982.wt, eviction-server: __posix_file_write, 579: /bitnami/mongodb/data/db/collection-0--7571207753880960982.wt: handle-write: pwrite: failed to write 8192 bytes at offset 4186927104: No space left on device Raw: [1653529280:903560][48:0x7f1f08fe1700], file:collection-0--7571207753880960982.wt, eviction-server: __posix_file_write, 579: /bitnami/mongodb/data/db/collection-0--7571207753880960982.wt: handle-write: pwrite: failed to write 8192 bytes at offset 4186927104: No space left on device

How large is your database after importing mutation assessor? It says No space left on device but I believe I have enough space, maybe some database setting I have is different?

nr23730 commented 2 years ago

Hi @leexgh,

this step needs quite some memory, I also had to increase the amount of resources that Docker can use on my machine. I use Docker for macOS, so it runs in a VM that also has space allocated. I think all this will require about 40GB.

leexgh commented 2 years ago

@nr23730 Also good to add Mutation Assessor as an optional data source, because the size of Mutation Assessor data is big so when people setup genome nexus instance they can choose if they want to include Mutation Assessor

nr23730 commented 2 years ago

@nr23730 Also good to add Mutation Assessor as an optional data source, because the size of Mutation Assessor data is big so when people setup genome nexus instance they can choose if they want to include Mutation Assessor

Hi @leexgh! Thanks a lot for your comments. Should all be resolved by now.