elderberry-smells / GBS_snakemake_pipeline

a pipeline based on the utility of snakemake to generate a vcf file from paired end sequencing data obtained from Illumina platforms
GNU General Public License v3.0
3 stars 2 forks source link

missing files #5

Open ifoo1213 opened 11 months ago

ifoo1213 commented 11 months ago

Hi, thanks for your pipeline, I installed the pipeline followed the instructions, the error comes out as missing files, I believe it's not take input in the demultiplex stage, and I don't know why the demultiplex.output.log shows

Running Command: $ python3 scripts/fastq_demultiplex.py -f ~/GBS_data/samples/OS0101FB_R1.fastq.gz -b ~/GBS_data/barcode_GBS.txt Started demultiplexing @: 2023-10-07 13:41:02.461436 Finished demultiplexing @: 2023-10-07 13:41:02.486359

gbsx001 TGACGCCATGCA 0 gbsx002 CAGATATGCA 0 gbsx003 GAAGTGTGCA 0 gbsx004 TAGCGGATTGCA 0 gbsx005 TATTCGCATTGCA 0 gbsx006 ATAGATTGCA 0

is it supposed to run python3 ~/gbs/GBS_snakemake_pipeline/workflow/scripts/PE_fastq_demultiplex_AAFC.py -f fastq.gz -b barcode.txt -s samplefile.txt instead?

my samplesheet.txt is like this

Sample_number Index_name Sample_ID Reference_path 1 gbsx001 OS0101FB1 /hifiasm_RJ/Hifiasm_RJ.merge.fasta 2 gbsx002 OS0101FB2 /hifiasm_RJ/Hifiasm_RJ.merge.fasta 3 gbsx003 OS0101FB3 /hifiasm_RJ/Hifiasm_RJ.merge.fasta

my barcode is the same as the example.

And the output of log is MissingOutputException in line 1 of ~/gbs/GBS_snakemake_pipeline/workflow/rules/demultiplex.smk

Would you mind take a look and maybe give some suggestions, Thank you.

elderberry-smells commented 11 months ago

Just want to make sure, the barcodes are the same as the ones in the barcode directory (not just name)? https://github.com/elderberry-smells/GBS_snakemake_pipeline/tree/master/workflow/resources/barcodes

Otherwise would you be able to put the full error in here for me to see?

Some things that have caused issues in the past:

Make sure you have permissions enabled to make new directories in the file system you are working on. chmod the directory where the fastq files live to rwx

The naming convention is unfortunately very specific. It has to be a (capital) library_R1.fastq.gz, library_R2.fastq.gz in the file name. I suspect this might be your issue without seeing full error message/config file.

I would also use a specific barcode file if you aren't using the full plate. Just a new barcode file with only indexes found in the plate might help the process/stats in demultiplexing, as the progam will try to estimate a % called and may error if it has to divide by 0 (index not in list).

ifoo1213 commented 11 months ago

slurm-1745729.txt Hi, Brian, thanks for your reply. I checked the issues you mentioned above, the directory is rwx, the file name is _R1.fastq.gz, they are correct. and also I did have an error for the divide by zero, but I change the code a little bit - add 1 to the total count to avoid this error, because I do have blank cells in the plate, please see the log.out in attached. Thanks.