PhyloGrok / VCFgenerator

Automated variant calling app for NextGen evolutionary genomics
GNU General Public License v3.0
0 stars 0 forks source link

Variant calling workflow - script automation #3

Open PhyloGrok opened 1 year ago

PhyloGrok commented 1 year ago

Basically 3 parts, each part should have its own shell script file:

  1. Retrieving SRA files (.fastq format) using EDirect and SRA-toolkit.
  2. Quality control - trimmomatic and fastqc
  3. Assembly and variant calling - bwa, samtools
PhyloGrok commented 1 year ago

Now it looks like the .fastq sequence retrieval part is robust. Storage may become an issue, likely need to request additional storage for larger datasets. .sam, .bam, and .bcf files are also very large. We'll need to dynamically delete them after processing to conserve storage.

Currently waiting for Lloyd's python scripts which will take user inputs and control the execution of the steps of the workflow.

PhyloGrok commented 1 year ago

This StackExchange has some initial ideas, they are using the Python function subprocess.call() : https://stackoverflow.com/questions/32085956/pass-a-variable-from-python-to-shell-script

PhyloGrok commented 1 year ago

Here's a different one where they seem to pass the output of test.py to a shell variable. https://stackoverflow.com/questions/2796932/how-do-i-pass-a-python-variable-to-bash

PhyloGrok commented 1 year ago

This one has a simple example of how the script should run, where Python variables are passed to the bash shell. https://unix.stackexchange.com/questions/466190/passing-python-variable-to-embedded-shell-script

LloydJonesIII commented 1 year ago

./R03.sh: line 21: datasets: command not found ** New error found will continue to work on this when I can over the weekend

LloydJonesIII commented 1 year ago

Completed first working variant of the Python Hub

LloydJonesIII commented 1 year ago

made major headway on the python split function being used to process our sra lists to run multiple trimmomatic instances in parallel

LloydJonesIII commented 1 year ago

Testing python hub code trial3.py you need the following function located in the same directory

LloydJonesIII commented 1 year ago

Current list of used resources Links

https://stackoverflow.com/questions/48209410/cant-open-sh-file https://stackoverflow.com/questions/32085956/pass-a-variable-from-python-to-shell-script https://stackoverflow.com/questions/65153137/multiple-inputs-using-subprocess-run-in-python-3-7 https://stackoverflow.com/questions/17742789/running-multiple-bash-commands-with-subprocess https://stackoverflow.com/questions/7585435/best-way-to-convert-string-to-bytes-in-python-3 https://stackoverflow.com/questions/3172470/actual-meaning-of-shell-true-in-subprocess#:~:text=Setting%20the%20shell%20argument%20to,before%20the%20command%20is%20run. https://unix.stackexchange.com/questions/242334/notepad-adds-r-to-shell-scripts https://support.nesi.org.nz/hc/en-gb/articles/218032857-Converting-from-Windows-style-to-UNIX-style-line-endings#:~:text=Converting%20using%20Notepad%2B%2B&text=To%20write%20your%20file%20in,with%20UNIX%2Dstyle%20line%20endings. https://stackoverflow.com/questions/31786287/how-to-split-large-text-file-in-windows https://www.tutorialspoint.com/How-to-read-a-file-from-command-line-using-Python#:~:text=Reading%20a%20file%20from%20command,file%20and%20read%20its%20contents. https://stackoverflow.com/questions/17255737/importing-variables-from-another-file https://www.pythonforbeginners.com/files/the-fastest-way-to-split-a-text-file-using-python

LloydJonesIII commented 1 year ago

python split.py file concept has been tested and confirmed to be working with trimmomatic command line shells

LloydJonesIII commented 1 year ago

In order to get the hub Python script to work as intended a higher Python script has been added to the script order as not to have the user constantly prompted for inputs as that's not what we wanted the new script is called

LloydJonesIII commented 1 year ago

Finished creating and testing the head python controller which will control all downstream shell and python scripts

LloydJonesIII commented 1 year ago

code error found

LloydJonesIII commented 1 year ago

Troubleshooting variant calling automation step currently stuck on this error set

[bwa_index] Pack FASTA... 0.01 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.39 seconds elapse. [bwa_index] Update BWT... 0.01 sec [bwa_index] Pack forward-only FASTA... 0.01 sec [bwa_index] Construct SA from BWT and Occ... 0.16 sec [main] Version: 0.7.17-r1188 [main] CMD: bwa index ../../media/volume/sdb/attempt7/assembly/reference/ref_genome [main] Real time: 0.617 sec; CPU: 0.574 sec SRR9025102 Variant calling process has begun [M::bwa_idx_load_from_disk] read 0 ALT contigs '.::main_mem] fail to open file `

LloydJonesIII commented 1 year ago

work completed friday 07-07-23

LloydJonesIII commented 1 year ago

New Error Code found and needs to be worked through

[main] CMD: bwa mem ../../media/volume/sdb/attempt11/assembly/reference/ref_genome.fasta ../../media/volume/sdb/attempt11/fastq/trimmed/SRR9025118_1.trim.fastq.gz ../../media/volume/sdb/attempt11/fastq/trimmed/SRR9025118_2.trim.fastq.gz [main] Real time: 25.855 sec; CPU: 26.814 sec [E::hts_open_format] Failed to open file ../../media/volume/sdb/attempt11/assembly/results/sam/SRR9025118.aligned.sam samtools view: failed to open "../../media/volume/sdb/attempt11/assembly/results/sam/SRR9025118.aligned.sam" for reading: No such file or directory [E::fai_build3_core] Failed to open the file ../../media/volume/sdb/attempt11/assembly/reference/ref_genome [E::hts_open_format] Failed to open file ../../media/volume/sdb/attempt11/assembly/results/bcf/SRR9025118_raw.bcf : No such file or directoryvolume/sdb/attempt11/assembly/results/bcf/SRR9025118_raw.bcf Can't open ../../media/volume/sdb/attempt11/assembly/results/vcf/SRR9025118.vcf: No such file or directory at /home/exouser/anaconda3/bin/vcfutils.pl line 265. SRR9025118 Variant calling process has finished

LloydJonesIII commented 1 year ago

07-14-23 (12-3pm)

LloydJonesIII commented 1 year ago

07-18-23

LloydJonesIII commented 1 year ago
LloydJonesIII commented 1 year ago
LloydJonesIII commented 1 year ago

07-21-23

LloydJonesIII commented 1 year ago

07-24-23

LloydJonesIII commented 1 year ago

07-25-23

to see limits: re-run with '-x' option.

============================================================= An error occurred during processing. A report was generated into the file '/home/intern4/ncbi_error_report.txt'. If the problem persists, you may consider sending the file to 'sra-tools@ncbi.nlm.nih.gov' for assistance.

fasterq-dump quit with error code 3 Archive: ../../media/volume/sdb/BigRun1/assembly/2242.zip inflating: ../../media/volume/sdb/BigRun1/assembly/reference/README.md inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/assembly_data_report.jsonl inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/GCF_004799605.1_ASM479960v1_genomic.fna inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/protein.faa inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/cds_from_genomic.fna inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gff inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gtf inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gbff inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/sequence_report.jsonl inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/dataset_catalog.json Gzip process has begun

gzip: ../../media/volume/sdb/BigRun1/fastq/SRR19515813.fastq.gz: No space left on device

LloydJonesIII commented 1 year ago
LloydJonesIII commented 1 year ago

08-01-23


LloydJonesIII commented 1 year ago

08-07-23

got a working variant of the splitter function working, seems to be properly splitting files