exomiser flag to exomiserDatabase

TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities

https://thejacksonlaboratory.github.io/LIRICAL/stable

Other

22 stars 11 forks source link

exomiser flag to exomiserDatabase #614

Closed lauragails closed 1 year ago

lauragails commented 1 year ago

Hello, Thank you for this software! I am running lirical-cli-2.0.0-RC1.jar and I have a few questions:

I noticed that the documentation for the -e flags was inconsistent with the tutorial here:

This suggests: -e38 | --exomiser-hg38: path to Exomiser variant database for hg38. Required if the analysis is run with exome/genome sequencing files and --assembly is set to hg38.

Note that this -e38 flag throws an error and does not show as an option when you type lirical prioritize -h

I am wondering if this has something to do with the error I just received: org.monarchinitiative.phenol.base.PhenolRuntimeException: TermId construction error: 'database_id' does not have a prefix!

My full command is java -jar lirical-cli-2.0.0-RC1.jar prioritize --display-all-variants --use-orphanet --vcf path/to/file.vcf.gz --assembly hg38 -d /path/to/v2.0.0.-RC1_prerelease/data/ -e /path/to/data/EXOMISER/2302_hg38/ -o OUTDIR/ -p HP:0000717 --sample-id ID --sex FEMALE -x ID -f html,tsv --transcript-db REFSEQ

How do you set --ddndv FALSE?

Thank you!

ielis commented 1 year ago

Hi, the -e38 flag is likely a result of version inconsistency. You're running v2.0.0-RC1 but the docs apply to v2.0.0-RC2. Updating to the latter should resolve the issue. Having said that, the code is still "fresh" and the bugs like this are likely to happen, so please let me know if something doesn't work after the update.

Updating to LIRICAL v2.0.0-RC2 will also resolve the PhenolRuntimeException present in LIRICAL RC1 which uses an older version of phenol library for parsing HPO annotation file. The format of the HPOA file was recently changed and the older phenol does not support the new format :/

Otherwise, the command looks OK for RC2, except for the -e option.

Last, --ddndv actually cannot be unset, you found a true bug.. Thanks.. :) I'll fix that and let you know when it's out there.

lauragails commented 1 year ago

Got it!

FYI for others who may see the thread, the reason I went with the pre-built executable was that I needed java/20+36-2344/jdk-20 to compile, which wasn't on our HPC. With this version, the code in the tutorial works seamlessly.

Trying now and will re-run when I hear back about the --ddndv. Thank you for fixing!

lauragails commented 1 year ago

I think it's running now!

I have two other feature requests: Would there be a way to allow for multiple output types? I'm running with html now, but I would also love tsv output. Is there a way to get both in one run? If not, it's not the end of the world.

Also, running with one vcf at a time is workable, but for larger-scale runs, it would be fantastic if we could use a merged vcf, since ideally I'd be able to run on a cohort that has been joint called (so I wouldn't have to pre-split the files)

edit: It looks like the samplename flag does check in the VCF so I assume that you can input the same VCF for all runs, and specify the samplename, and have the appropriate sample be pulled out. Though I haven't tried it yet.

Thank you for considering!

ielis commented 1 year ago

Hi, the latest LIRICAL now has a --sdwndv flag which is a logical opposite of the now --ddndv deprecated flag. The default behavior of the app (--sdwndv not in the CLI) stays the same as in previous version, but adding --sdwndv option will include all diseases in the report (which was impossible with --ddndv).

To get the latest release, you'll have to build it from sources as described in the docs. I recommend to copy the distribution ZIP from the lirical-cli/target/ folder to another location and try it from there.

Regarding the feature requests. It should already be possible to produce multiple output types, try something like lirical ... -f html -f json

Then, there are two contexts for running from merged VCFs: family and cohort. Running LIRICAL for a family (e.g. a trio) is something that we have on our radar. Unfortunately, it is not entirely trivial to implement inheritance-based filtering/prioritization, so it will take some time. However, analyzing a single sample from a merged VCF should already work if you provide correct sample identifier (as you pointed out in your edit).

lauragails commented 1 year ago

Thank you! The commands in the doc pull RC2 so that flag can't be used (yet)

cd LIRICAL/lirical-cli/target ; ls gives

apidocs            javadoc-bundle-options                  lirical-cli-2.0.0-RC2-javadoc.jar    test-classes
archive-tmp        lib                                     maven-archiver
classes            lirical-cli-2.0.0-RC2-distribution.zip  maven-javadoc-plugin-stale-data.txt
generated-sources  lirical-cli-2.0.0-RC2.jar               maven-status

ielis commented 1 year ago

Ah, yes. The docs for latest don't instruct to checkout the dev branch. So, in fact, this is what you must do to build the latest LIRICAL:

git clone https://github.com/TheJacksonLaboratory/LIRICAL.git
cd LIRICAL
# switch to the dev branch
git checkout develop
# ensure you've got the latest code
git pull

./mvnw -Prelease clean package

Of course, feel free to omit the clone if you already have the repo somewhere. You just need to checkout the develop branch and to make sure you've got the latest code in your local copy. In the end, you should have a ZIP file in the lirical-cli/target folder and the rest of the docs should apply.