Closed dnharry closed 8 months ago
Hi @dnharry!
It looks like one of these VCF files has a variant entry with DP=0 which is causing the division by zero error. Could you inspect the VCF files in /mnt/c/Users/Atajera_/epi2melabs/instances/wf-amplicon_01HC21YGHFCZ8X2RP1399VA2SP/work/c2/260a4423e104495b8e9df40601ae04/data
?
In case you're unfamiliar with WSL, please let me know and I will send detailed instructions on how to do this.
Thank you for the response
The is no VCF file. Attached are the files
Hmm, the directory on the screenshot is not the working directory of the failed process (.../work/2c/aaf...
instead of .../work/c2/260...
). When navigating there, you perhaps might have swapped the letters of c2
to 2c
?
In any case, please make sure to go to /mnt/c/Users/Atajera_/epi2melabs/instances/wf-amplicon_01HC21YGHFCZ8X2RP1399VA2SP/work/c2/260a4423e104495b8e9df40601ae04/data
and there you should find VCF files in the different barcode directories.
Sorry, my bad.
Nice, this looks like the right location. These are symlinks pointing to the directories containing the files needed for creating the report. I'm afraid you won't be able to follow the symlink in the Windows File Explorer though (I didn't think about this earlier). Instead, please follow the steps below:
Open Powershell (or the Command Prompt if you prefer) and type wsl -d epi2me
. This should drop you in the Linux shell of the EPI2ME WSL distribution that was installed alongside the EPI2ME desktop app.
Now, move to the work directory of the failed process by pasting
cd /mnt/c/Users/Atajera_/epi2melabs/instances/wf-amplicon_01HC21YGHFCZ8X2RP1399VA2SP/work/c2/260a4423e104495b8e9df40601ae04/data
(note the use of the cd
command to change the directory).
Then, run
ls | xargs -i bash -c 'cp {}/*vcf.gz {}.vcf.gz'
to copy all VCFs from the barcode subdirs into the current dir (while also renaming them).
Next, we can make a new subdirectory, place all VCFs in it, and package it into a neat tarball so that you'll only have to upload a single file. Run the below
mkdir vcfs && mv *vcf.gz vcfs && tar czf vcfs.tar.gz vcfs
With a little luck the above all worked and if you run ls
now, you should see the barcode symlinks and a file named vcfs.tar.gz
. If that's the case, open the File Explorer with explorer.exe .
(don't forget the .
) and then please upload this file here (as long as it's not too large hopefully).
Right! Please find below vcfs.tar.gz
Actually, there is no need to upload files as it only takes one extra command to check them yourself. Could you please follow the steps below and let me know what you find.
wsl -d epi2me
, and go to the directory with the VCFs you created (if you deleted the directory, just re-run the steps from the previous post above).
cd /mnt/c/Users/Atajera_/epi2melabs/instances/wf-amplicon_01HC21YGHFCZ8X2RP1399VA2SP/work/c2/260a4423e104495b8e9df40601ae04/data/vcfs
DP=0
zgrep 'DP=0' *
Right! Found it,
What do u suggest I do to avoid this issue?
We will add a filter to remove such variants in the next release which should go out early next week. Unless you desperately need the results for barcode31 until then, I think the easiest workaround for you is to just drop this barcode from the input and run the wf only on the other barcodes for now.
Sure, thank you very much!
Looks like this is caused by an issue with how variants are annotated in Medaka. @dnharry, could you share the reads of barcode31 with us so that we can reproduce the problem and fix it upstream? Thanks!
I sure can but it has multiple fastq files which sum to ~70 MB The link below is to a zipped of the fastq.gz files https://drive.google.com/file/d/1K6RnuiihwN3u0f2p8RaHbhSpk9wWMs8_/view?usp=drive_link
I have also noticed the medaka.consensus.fasta in the consensus is the reference sequences instead of for the sample.
I have also noticed the medaka.consensus.fasta in the consensus is the reference sequences instead of for the sample.
Does this mean that the report shows variants for a sample which are not reflected in the consensus? The consensus is generated by incorporating the variants found by medaka into the provided reference. So overall, the sequences should look fairly similar depending on how many variants were found.
Hi @dnharry,
I'm the lead developer of medaka, amongst other things, the google drive link you have posted requires you to grant permissions. I have requested access.
Oh I have granted access
.
Oh okay, I see. I provided the whole gene instead of the portion of the gene that was amplified as the reference.
Then I will have to feed in the exact region then.
Hi @dnharry, To reproduce the problem, we would also need the reference you used (sorry, I forgot to mention this explicitly earlier).
However, it might actually be easier if you just share the input files that went into medaka directly. To find them, please open Powershell again, run wsl -d epi2me
, and navigate to the makeReport work dir as before
cd /mnt/c/Users/Atajera_/epi2melabs/instances/wf-amplicon_01HC21YGHFCZ8X2RP1399VA2SP/work/c2/260a4423e104495b8e9df40601ae04/data
Nextflow relies on symlinks to make input files available to the relevant processes. We need to read the link of the VCF to find out where it was generated. The below reads the link, gets the parent directory, and puts it into a .tar.gz
archive (notice the added h
option to tell tar
to dereference links and the -C
to tell it to change directories to avoid absolute paths in the archive).
tar czfh medaka-variant-inputs.tar.gz -C "$(dirname "$(readlink barcode31/medaka.annotated.vcf.gz)")" .
Then, if you run explorer.exe .
again, you should see a file called medaka-variant-inputs.tar.gz
. This should also be a lot smaller since the workflow downsamples reads before running medaka. Please update this file here. Many thanks!
Oh no worries. Done but the size is ~40 which falls short of Github's 25 MB. So here goes https://drive.google.com/file/d/1jFRB69GV0tLfjtHSL0D3fbpxrH98tLZg/view?usp=sharing
Hi @dnharry,
This issue will be resolved in the next release, to appear shortly.
Hi @dnharry,
This issue will be resolved in the next release, to appear shortly.
Thank you!
Hi @dnharry, the new release (v0.4.1) should fix this. If the problem persists, please let us know and re-open this issue.
Operating System
Windows 11
Other Linux
No response
Workflow Version
v0.3.5
Workflow Execution
EPI2ME Desktop application
EPI2ME Version
v5.1.3
CLI command run
No response
Workflow Execution - CLI Execution Profile
None
What happened?
I have fastq files in separate sample folders that run successfully, specifying 2 folders. I tried batch running on 10 folders and I received the below error. I reduced the number to 5 but I still received this error. Please help, thank you!
Relevant log output
Application activity log entry