Open PhyloGrok opened 1 year ago
Now it looks like the .fastq sequence retrieval part is robust. Storage may become an issue, likely need to request additional storage for larger datasets. .sam, .bam, and .bcf files are also very large. We'll need to dynamically delete them after processing to conserve storage.
Currently waiting for Lloyd's python scripts which will take user inputs and control the execution of the steps of the workflow.
This StackExchange has some initial ideas, they are using the Python function subprocess.call() : https://stackoverflow.com/questions/32085956/pass-a-variable-from-python-to-shell-script
Here's a different one where they seem to pass the output of test.py to a shell variable. https://stackoverflow.com/questions/2796932/how-do-i-pass-a-python-variable-to-bash
This one has a simple example of how the script should run, where Python variables are passed to the bash shell. https://unix.stackexchange.com/questions/466190/passing-python-variable-to-embedded-shell-script
./R03.sh: line 21: datasets: command not found ** New error found will continue to work on this when I can over the weekend
Completed first working variant of the Python Hub
made major headway on the python split function being used to process our sra lists to run multiple trimmomatic instances in parallel
Testing python hub code trial3.py you need the following function located in the same directory
https://stackoverflow.com/questions/48209410/cant-open-sh-file https://stackoverflow.com/questions/32085956/pass-a-variable-from-python-to-shell-script https://stackoverflow.com/questions/65153137/multiple-inputs-using-subprocess-run-in-python-3-7 https://stackoverflow.com/questions/17742789/running-multiple-bash-commands-with-subprocess https://stackoverflow.com/questions/7585435/best-way-to-convert-string-to-bytes-in-python-3 https://stackoverflow.com/questions/3172470/actual-meaning-of-shell-true-in-subprocess#:~:text=Setting%20the%20shell%20argument%20to,before%20the%20command%20is%20run. https://unix.stackexchange.com/questions/242334/notepad-adds-r-to-shell-scripts https://support.nesi.org.nz/hc/en-gb/articles/218032857-Converting-from-Windows-style-to-UNIX-style-line-endings#:~:text=Converting%20using%20Notepad%2B%2B&text=To%20write%20your%20file%20in,with%20UNIX%2Dstyle%20line%20endings. https://stackoverflow.com/questions/31786287/how-to-split-large-text-file-in-windows https://www.tutorialspoint.com/How-to-read-a-file-from-command-line-using-Python#:~:text=Reading%20a%20file%20from%20command,file%20and%20read%20its%20contents. https://stackoverflow.com/questions/17255737/importing-variables-from-another-file https://www.pythonforbeginners.com/files/the-fastest-way-to-split-a-text-file-using-python
python split.py file concept has been tested and confirmed to be working with trimmomatic command line shells
In order to get the hub Python script to work as intended a higher Python script has been added to the script order as not to have the user constantly prompted for inputs as that's not what we wanted the new script is called
Finished creating and testing the head python controller which will control all downstream shell and python scripts
code error found
[bwa_index] Pack FASTA... 0.01 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.39 seconds elapse. [bwa_index] Update BWT... 0.01 sec [bwa_index] Pack forward-only FASTA... 0.01 sec [bwa_index] Construct SA from BWT and Occ... 0.16 sec [main] Version: 0.7.17-r1188 [main] CMD: bwa index ../../media/volume/sdb/attempt7/assembly/reference/ref_genome [main] Real time: 0.617 sec; CPU: 0.574 sec SRR9025102 Variant calling process has begun [M::bwa_idx_load_from_disk] read 0 ALT contigs '.::main_mem] fail to open file `
work completed friday 07-07-23
[main] CMD: bwa mem ../../media/volume/sdb/attempt11/assembly/reference/ref_genome.fasta ../../media/volume/sdb/attempt11/fastq/trimmed/SRR9025118_1.trim.fastq.gz ../../media/volume/sdb/attempt11/fastq/trimmed/SRR9025118_2.trim.fastq.gz [main] Real time: 25.855 sec; CPU: 26.814 sec [E::hts_open_format] Failed to open file ../../media/volume/sdb/attempt11/assembly/results/sam/SRR9025118.aligned.sam samtools view: failed to open "../../media/volume/sdb/attempt11/assembly/results/sam/SRR9025118.aligned.sam" for reading: No such file or directory [E::fai_build3_core] Failed to open the file ../../media/volume/sdb/attempt11/assembly/reference/ref_genome [E::hts_open_format] Failed to open file ../../media/volume/sdb/attempt11/assembly/results/bcf/SRR9025118_raw.bcf : No such file or directoryvolume/sdb/attempt11/assembly/results/bcf/SRR9025118_raw.bcf Can't open ../../media/volume/sdb/attempt11/assembly/results/vcf/SRR9025118.vcf: No such file or directory at /home/exouser/anaconda3/bin/vcfutils.pl line 265. SRR9025118 Variant calling process has finished
to see limits: re-run with '-x' option.
fasterq-dump quit with error code 3 Archive: ../../media/volume/sdb/BigRun1/assembly/2242.zip inflating: ../../media/volume/sdb/BigRun1/assembly/reference/README.md inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/assembly_data_report.jsonl inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/GCF_004799605.1_ASM479960v1_genomic.fna inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/protein.faa inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/cds_from_genomic.fna inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gff inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gtf inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/genomic.gbff inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/GCF_004799605.1/sequence_report.jsonl inflating: ../../media/volume/sdb/BigRun1/assembly/reference/ncbi_dataset/data/dataset_catalog.json Gzip process has begun
gzip: ../../media/volume/sdb/BigRun1/fastq/SRR19515813.fastq.gz: No space left on device
08-01-23
got a working variant of the splitter function working, seems to be properly splitting files
Basically 3 parts, each part should have its own shell script file: