UPHL-BioNGS / Wastewater-genomic-analysis

1 stars 0 forks source link

Errors in fastq_dir_to_samplesheet.py #2

Open DrB-S opened 1 year ago

DrB-S commented 1 year ago

I am running the wastewater pipeline. I have created the sample sheet, and it is in the dir. However the script cannot find the sample sheet, and when it tries to create a new one, I get the following error in line 96 of fastq_dir_to_samplesheet.py:

Tue Jul 18 13:24:18 MST 2023 : Run Wastewater sample data with viralrecon for run Wastewater_17Jul2023 Tue Jul 18 13:24:18 MST 2023 : First create input samplesheet for viralrecon pipeline Tue Jul 18 13:24:18 MST 2023 : Wastewater_17Jul2023_samplesheet.csv does not exist. Creating samplesheet required to run viralrecon File "/data/home/becksts/.nextflow/assets/UPHL-BioNGS/Wastewater-genomic-analysis/conf-files/fastq_dir_to_samplesheet.py", line 96 glob.glob(os.path.join(fastq_dir, f"*{extension}"), recursive=False) ^ SyntaxError: invalid syntax Tue Jul 18 13:24:18 MST 2023 : Checking if the viralrecon pipeline completed successfully Tue Jul 18 13:24:18 MST 2023 : Oops .. something went wrong and pipeline stopped

poojasgupta commented 1 year ago

@DrB-S It looks like an issue with the extension of your fastq file name. The default is set to 'R1_001.fastq.gz'. Could you share an example name of the fastq files that you are trying to run?

DrB-S commented 1 year ago

Sure:

AB0313_S35_L001_R1_001.fastq.gz AB0313_S35_L001_R2_001.fastq.gz

Stephen M. Beckstrom-Sternberg, PhD Bioinformatics Contractor

Arizona State Public Health Lab Arizona Department of Health Services Cell: (602) 653-5011 Email: @.***

On Jul 20, 2023, at 12:12 PM, Pooja Gupta @.***> wrote:

@DrB-S https://github.com/DrB-S It looks like an issue with the extension of your fastq file name. The default is set to 'R1_001.fastq.gz'. Could you share an example name of the fastq files that you are trying to run?

— Reply to this email directly, view it on GitHub https://github.com/UPHL-BioNGS/Wastewater-genomic-analysis/issues/2#issuecomment-1644460048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVTVLJV5YMZ3KFDRLSIOGYTXRF7LBANCNFSM6AAAAAA2O5NMP4. You are receiving this because you were mentioned.

-- CONFIDENTIALITY NOTICE:  This e-mail is the property of the Arizona Department of Health Services and contains information that may be PRIVILEGED, CONFIDENTIAL, or otherwise exempt from disclosure by applicable law.  It is intended only for the person(s) to whom it is addressed.  If you have received this communication in error, please do not retain or distribute it.  Please notify the sender immediately by e-mail at the address shown above and delete the original message.  Thank you.  

poojasgupta commented 1 year ago

The file names look correct so it should ideally work. As the scripts rely on a specific directory structure we use here at UPHL, I would also make sure of that. Are you running just the viralrecon script? Could you please share the full log of the command you ran and its output?

DrB-S commented 1 year ago

No. I am running the script that calls all three scripts.

Here is my command-line:

sh ~/.nextflow/assets/UPHL-BioNGS/Wastewater-genomic-analysis/run_wwtp_sequencing_analysis.sh Wastewater_17Jul2023

And below is the output (not sure why singularity is not a configuration profile):

Purpose: Bash script to automate wastewater sequencing analysis. Consists of three individual scripts 1) WWP_seq_initialize_analysis.sh - Set up folder structure for running for sequencing data analysis and cleans up fastq filenames for NCBI submission. 2) run_viralrecon.sh - Run viralrecon bioinformatic pipeline with wastewater sequencing data. 3) run_freyja_vrn_noBoot.sh - Run Freyja with BAM files from viralrecon and generate final output files for Microreact visualization

Usage: run_wwtp_sequencing_analysis.sh

Last updated on June 05,2023

Thu Jul 20 15:26:45 MST 2023 : Step 1/3. Set up wastewater sequencing analysis

Purpose: 1) This is the first step in script run_wwtp_sequencing_analysis_v2 which sets up directory structure for initiating Wastewater sequencing run analysis and any downstream analysis. 3) Generate ncbi submission folder that can be directly used for uploading files to NCBI and create a csv file used for uploading into Data-flo to extract biosample and SRA metadata tables.

Usage: sh WWP_seq_new_run_auto.sh | tee -a WWP_seq_new_run_auto.log Last updated on June 5,2023

Thu Jul 20 15:26:45 MST 2023 : Fastq generation step is not yet completed for run Wastewater_17Jul2023. Exiting... Thu Jul 20 15:26:45 MST 2023 : Step 2/3. Run viralrecon

Purpose: Bash script to run viralrecon bioinformatic pipeline with wastewater sequencing data.

Usage: run_viralrecon.sh | tee -a viralrecon.log

Last updated on May 16,2023

Thu Jul 20 15:26:45 MST 2023 : Run Wastewater sample data with viralrecon for run Wastewater_17Jul2023 Thu Jul 20 15:26:45 MST 2023 : First create input samplesheet for viralrecon pipeline Thu Jul 20 15:26:45 MST 2023 : /data/Sequence_analysis/Wastewater-genomic-analysis//Wastewater_17Jul2023/analysis/viralrecon/Wastewater_17Jul2023_samplesheet.csv already exists, starting viralrecon Thu Jul 20 15:26:45 MST 2023 : Running viralrecon N E X T F L O W ~ version 23.04.2 Unknown configuration profile: 'singularity' Thu Jul 20 15:26:47 MST 2023 : Checking if the viralrecon pipeline completed successfully Thu Jul 20 15:26:47 MST 2023 : Oops .. something went wrong and pipeline stopped

=== I also notice that YPHL_viralrecon.config seems problematic when I view it in Visual Studio Code. It says, “Content is not allowed in prolog”.

Thanks for any suggestions,

Stephen M. Beckstrom-Sternberg, PhD Bioinformatics Contractor

Arizona State Public Health Lab Arizona Department of Health Services Cell: (602) 653-5011 Email: @.***

On Jul 20, 2023, at 1:54 PM, Pooja Gupta @.***> wrote:

The file names look correct so it should ideally work. As the scripts rely on a specific directory structure we use here at UPHL, I would also make sure of that. Are you running just the viralrecon script? Could you please share the full log of the command you ran and its output?

— Reply to this email directly, view it on GitHub https://github.com/UPHL-BioNGS/Wastewater-genomic-analysis/issues/2#issuecomment-1644591872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVTVLJVF3FDSHHMUNBX4TGTXRGLHHANCNFSM6AAAAAA2O5NMP4. You are receiving this because you were mentioned.

-- CONFIDENTIALITY NOTICE:  This e-mail is the property of the Arizona Department of Health Services and contains information that may be PRIVILEGED, CONFIDENTIAL, or otherwise exempt from disclosure by applicable law.  It is intended only for the person(s) to whom it is addressed.  If you have received this communication in error, please do not retain or distribute it.  Please notify the sender immediately by e-mail at the address shown above and delete the original message.  Thank you.