AdamaJava / adamajava

Other
14 stars 5 forks source link

Vcf qio #237

Closed ChristinaXu2017 closed 3 years ago

ChristinaXu2017 commented 3 years ago

Description

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

unit tests is updated, TC regression test will test most Adamajava tools.

Checklist:

ChristinaXu2017 commented 3 years ago

Have you had a chance to manually test these changes on the annotation process? It would be very comforting to know that identical output is produced (to say a production run) and that it doesn't take any longer to run.

failed on qsv call. I am investigating now, details on /working/genomeinfo/cromwell-test/cromwell-executions/somaticDnaFastqToMaf/9037df1e-36fe-4803-8174-4a1bc7ee9103/call-qsvControlVsTest/execution

ChristinaXu2017 commented 3 years ago

I pushed my local build jars to /software/adamajava/adamajava-release/xuTest_adamaJavaVersion, and then run somaticDnaFastqToMaf with jars inside this folder. It was failed during qsvControlVsTest. hence I run the qsv again with our nightly version and xuTest_adamaJavaVersion. But both suceed, and the results passed by our regression_GS_latest_bams.sh; regression_GS_latest_mafs.sh and regression_GS_latest_xmls.sh

ChristinaXu2017 commented 3 years ago

In this pull request, there is no change in qsv. there is the only qsv that failed during our manual test and then succeed during re-run. It seems there is a random exception inside qsv but nothing to do with this pull request.

ChristinaXu2017 commented 3 years ago

after 10 hours run, the WGS failed due to a permission issue. I have to change root dir to temp/mock_analysis and re-run it.

ChristinaXu2017 commented 3 years ago

Cromwell job dye again due to the walltime exceed

ChristinaXu2017 commented 3 years ago

It seems there are different WDLs for somaticFastqtoMaf. The one in our genome info svn may not cope with the WGS. As a developer, it is impossible to run on such a big data set. We can only run on a small dataset with limited resources. The best way to test it is to ask an expert to run it before release (tag it). This is why we have merge, release, adama-nightly, adama-current.

holmeso commented 3 years ago

It seems there are different WDLs for somaticFastqtoMaf. The one in our genome info svn may not cope with the WGS.

The wdl should work against WGS datasets - thats what is used in production currently. Perhaps you chose the wrong wdl? Would it not be simpler to just run the annotation process against bams that already exist? Running the whole wdl seems overkill to just test the reading/writing of the vcf files.

As a developer, it is impossible to run on such a big data set. We can only run on a small dataset with limited resources. The best way to test it is to ask an expert to run it before release (tag it).

I'm not sure I agree with this - as developers we need to ensure that the code we have written has been tested against data as close to production level data as possible. If this is not possible for developers to do, then I'm not sure how feasible it's going to be for other people to do..

ChristinaXu2017 commented 3 years ago

There are several copies of this WDL pipeline, The one I pick up from genome svn; but our production team uses the one from a different location, eg. /working/lab_nic. they are slightly different, eg. the parameter settings inside qprofiler2.wdl are different and you can't find the difference from the jason input file.

This pipeline is not only called our adamajava, we are not the right people to troubleshooting this pipeline. What we should do is only to test our adamajava. eg as qannotate, q3indel, qprofiler etc. I have to give up to run the somaticFastqtoMaf on a WDL dataset.

ChristinaXu2017 commented 3 years ago

It is dated, no value to merge to master