NBISweden / Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
https://www.earthbiogenome.org/
GNU General Public License v3.0
11 stars 8 forks source link

Alternative mitogenome strategy needed when MitoHifi fails. #86

Open gbdias opened 8 months ago

gbdias commented 8 months ago

Is your feature request related to a problem? Please describe. In my experience MitoHiFi has often not worked and I need to employ another strategy to assemble the mtdna (plus additional organelles).

Describe the solution you'd like I suggest that a failure in the mitohifi process is ignored and the remaining workflow proceeds normally. Like:

errorStrategy 'ignore'

mahesh-panchal commented 8 months ago

I would prefer this is added to the run nextflow.config rather than the whole workflow, but the underlying question is why is MitoHifi not performing here. Can you elaborate on the problems?

gbdias commented 8 months ago
mahesh-panchal commented 8 months ago

Then let's create a strategy to deal with it.

So what exactly is the cause of failure? Is it MitoHifi exiting with an error (non-zero exit code?), or does Nextflow exit with an error because of certain files not being present (we can make a PR to change the module on nf-core modules, or make a local module with that behaviour - Then if an output is empty it could trigger a different process)?

It seems we should add a section to do assembly from reads. Should we make this the default, rather than from contigs? Should this be a complementary assembly strategy like the hifiasm assembly (e.g. so one can just assemble the organelles ( or choose mitochondria/chloroplast specifically)?

gbdias commented 8 months ago

Exit status 1, failed

22      97/a7593a       45307198        EVALUATE_RAW_ASSEMBLY:MERQURYFK_MERQURYFK (hifiasm-raw-default) COMPLETED       0       2024-03-01 04:18:16.174 4m 44s  52.7s   243.2%  1.1 GB  11.5 GB 11.4 GB 3.3 GB
23      9c/8ef0a5       45307224        MITOHIFI_MITOHIFI (hifiasm-raw-default) FAILED  1       2024-03-01 04:22:20.746 4m 15s  25s     -       -       -       -       -
19      00/6cbb7e       45307192        EVALUATE_RAW_ASSEMBLY:BUSCO (hifiasm-raw-default-basidiomycota_odb10)   ABORTED -       2024-03-01 04:18:15.646 -       -       -       -       -       -       -
Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    Environment variable SINGULARITYENV_SNIC_TMP is set, but APPTAINERENV_SNIC_TMP is preferred
  Matplotlib created a temporary config/cache directory at /scratch/45307224/matplotlib-2e7gi_8t because the default path (/home/guibo205/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  Attention!
  'parsed_blast.txt' and 'parsed_blast_all.txt' files are empty.
  The pipeline has stopped !! You need to run further scripts to check if you have mito reads pulled to a large NUMT!
mahesh-panchal commented 8 months ago

There's oneish alternative strategy in the notes you listed. Did you do the same ( i.e. use MBG )? Did that give you a mitogenome? What about oatk?

gbdias commented 8 months ago

Yes. I pulled mtdna reads using minimap2 and a reference, then de-novo assembled using MBG. I'm not sure it is worth it to implement this in a module right now because multiple k-mer and window sizes need to be tested in until you get a single circular contig. Not sure a set of default parameters would transfer well between datasets, but it could be the target of some development from us.

Haven't tried oatk yet.

mahesh-panchal commented 8 months ago

Will leave issue open as a reminder to implement alternative mitogenome strategy.

Related paper: MBG https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8521641/

mahesh-panchal commented 8 months ago

Note: Update directive value to ignoreThenFail when it gets merged to core Nextflow.