Closed MaestSi closed 4 years ago
Hey Simone -- for guppy (and porechop) demuxing, we parse the barcodes from the FASTQ contents, not the filename/dirname.
I would guess (but have no proof) that this is due to guppy keeping the kikwit fastq it is writing to open. Rampart will only read each fastq after it is closed, in order to avoid reading files which are still being written to. Do the other samples have accurate read counts, or are they (e.g.) multiples of 1000?
Hi, guppy_barcoder was run before rampart, so no fastq files should be open. Here are the number of reads processed by rampart: barcode01: 3488 (all of them) barcode03: 0 (out of 213) barcode04: 1251 (out of 5251). So, also for barcode04, 4000 reads are missing. Rampart also prints some warnings:
[warning] Detected "new" FASTQ fastq_runid_e9e588bddbea1984a1556c61a8d53decbecf82e2_0 which has already been seen!
[warning] Detected "new" FASTQ fastq_runid_e9e588bddbea1984a1556c61a8d53decbecf82e2_0 which has already been seen!
[warning] Detected "new" FASTQ fastq_runid_e9e588bddbea1984a1556c61a8d53decbecf82e2_0 which has already been seen!
[warning] Detected "new" FASTQ fastq_runid_e9e588bddbea1984a1556c61a8d53decbecf82e2_1 which has already been seen!
So, my guess is that rampart doesn't like files having the same name, but being in different folders due to different barcodes. As this is the way guppy_barcoder names fastqs, probably the folder name should be used too for naming csv files in annotations folder.
my guess is that rampart doesn't like files having the same name, but being in different folders due to different barcodes
Spot on. I'll include the full path in the list of seen files. Thanks for tracking this down!
You're welcome!
Hi @MaestSi -- if you have time, would you mind checking these data using rampart 1.2.0rc1
(the newest, pre-release version). It should be working now. You can install this in your conda environment by running
conda install artic-network/label/test::rampart==1.2.0rc1
Hi, I tried installing and running it but I got this error:
node: /home/simone/miniconda3/envs/artic-rampart/bin/../lib/libcrypto.so.1.1: version OPENSSL_1_1_1b' not found (required by node)
Hi, I tried reinstalling the conda version in the master branch and when running rampart --help
it showed the same error. However, I tried following instructions to Install from source
and it worked perfectly, also the issue of fastqs with the same name looks like solved.
Before, I used to rename them with:
#!/bin/bash
demultiplexing_dir=$1
for bc in $(find $demultiplexing_dir -maxdepth 1 | grep barcode) ; do
bc_id=$(basename $bc)
for f in $(find $bc -maxdepth 1 | grep \\.fastq) ; do
curr_dir=$(dirname $f)
mv $f $curr_dir"/"$bc_id"_"$(basename $f)
done
done
but it looks like there is no need to do it anymore. Only the conda installation remains to be fixed. Thanks, Simone
Thanks - this is good information to have. Will fix the conda install...
This should be fixed in rampart v1.2.0, now available on conda. Please reopen if you have this issue again!
Hi, I am trying out rampart v1.1.0, starting from demultiplexed reads. I performed demultiplexing with guppy_barcoder using
--require_barcodes_both_ends
option, as suggested here. I am using example_data, and these reads survive the demultiplexing.I have modified the run_configuration.json file accordingly:
When running rampart, I found out that reads from Kikwit strain are never loaded. Is it due to the small number of reads (213) for that sample or is there something wrong in what I am doing? Moreover, could you please confirm that barcodes names specified in the run_configuration.json file should match the reads names in the header, and not the directory names instead? As a test, I tried renaming directory barcode01 in demultiplexing folder to, say, B1, and the results were the same. Thanks in advance, Simone