BenoitMorel / covid19_cme_analysis

GNU Affero General Public License v3.0
7 stars 1 forks source link

subprocess.CalledProcessError: #1

Closed vinitamehlawat closed 2 years ago

vinitamehlawat commented 2 years ago

Hi

I clone this repo in my Ubuntu system and run the setup.sh and everything ran perfectcly. I ran first script ./pipeline/0_get_data.py path to my fasta file and it created a folder with current date in work_dir, But when I tried to run the ./pipeline/1_preprocess_data.py work_dir/2021-10-13_00/fmsan it gives following error :

/home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh /home/vinita/covid19_cme_analysis-master/work_dir/2021-10-13_00/covid_raw_unaligned.fasta fmsan /home/vinita/covid19_cme_analysis-master/scripts /home/vinita/covid19_cme_analysis-master/software/mafft/mafft.bat /home/vinita/covid19_cme_analysis-master/config/outgroups.txt /home/vinita/covid19_cme_analysis-master/work_dir/2021-10-13_00 48
Totally 0 sequences pass the filter of less than 10 Ns
/home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh: line 92: /home/vinita/covid19_cme_analysis-master/scripts/../software/genesis/bin/apps/remove_sequences: No such file or directory
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis-master/./pipeline/1_preprocess_data.py", line 14, in <module>
    preprocessing.trim_separate_align(paths.raw_sequences,
  File "scripts/preprocessing.py", line 24, in trim_separate_align
    subprocess.check_call(cmd, cwd=runsdir)
  File "/home/vinita/.local/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh', '/home/vinita/covid19_cme_analysis-master/work_dir/2021-10-13_00/covid_raw_unaligned.fasta', 'fmsan', '/home/vinita/covid19_cme_analysis-master/scripts', '/home/vinita/covid19_cme_analysis-master/software/mafft/mafft.bat', '/home/vinita/covid19_cme_analysis-master/config/outgroups.txt', '/home/vinita/covid19_cme_analysis-master/work_dir/2021-10-13_00', '48']' returned non-zero exit status 127

I have checked this genesis older for bin/apps/remove_sequences but this remove_sequences are present in build folder I tried to install this repo in Mac as well as in Linux mint but the setup.sh is giving error and not able to installed on these two OS

Kindly have a look and suggest me how to run these script to get better results for my Sars-Cov2 data

pierrebarbera commented 2 years ago

Hi @vinitamehlawat

sorry for the hard-to-decipher error message! Looks like the issue is related to your input sequences:

Totally 0 sequences pass the filter of less than 10 Ns

The filtering criteria set out in our scripts will eliminate any input sequence that has more than 10 N characters. If you want to tinker with this setting, the relevant line is here: https://github.com/BenoitMorel/covid19_cme_analysis/blob/a034f9b811e8a33bf387a710bbb6bed846eb10cd/scripts/filterSequences.pl#L17

Let me know if that works!

Pierre

vinitamehlawat commented 2 years ago

Hi Pierre

Thanks for your prompt response, But I have counted Number of Ns in my fasta sequences and not all but most of the sequences are having more than 50 Ns.

On Fri, 15 Oct 2021 at 1:18 PM, Pierre Barbera @.***> wrote:

Hi @vinitamehlawat https://github.com/vinitamehlawat

sorry for the hard-to-decipher error message! Looks like the issue is related to your input sequences:

Totally 0 sequences pass the filter of less than 10 Ns

The filtering criteria set out in our scripts will eliminate any input sequence that has more than 10 N characters. If you want to tinker with this setting, the relevant line is here: https://github.com/BenoitMorel/covid19_cme_analysis/blob/a034f9b811e8a33bf387a710bbb6bed846eb10cd/scripts/filterSequences.pl#L17

Let me know if that works!

Pierre

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BenoitMorel/covid19_cme_analysis/issues/1#issuecomment-944075696, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJQTD7JCHHTTE7DPZ4XHETUG7ME7ANCNFSM5F6WSFGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pierrebarbera commented 2 years ago

Hi @vinitamehlawat,

then I suggest increasing $lessthanN to an appropriately high number such that your sequences pass this filter. Note however that 10 is the number we used in the paper, so I can't say what that will do to the quality of the output. My guess is that the inferred trees will get worse, as you're allowing for lower quality data.

Pierre

vinitamehlawat commented 2 years ago

Hi @Pbdas

As per your suggestrion I have chnaged the my $lessthanN = 10; to my $lessthanN = 50 but I got the same error

./pipeline/1_preprocess_data.py work_dir/2021-10-18_00/fmsan /home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh /home/vinita/covid19_cme_analysis-master/work_dir/2021-10-18_00/covid_raw_unaligned.fasta fmsan /home/vinita/covid19_cme_analysis-master/scripts /home/vinita/covid19_cme_analysis-master/software/mafft/mafft.bat /home/vinita/covid19_cme_analysis-master/config/outgroups.txt /home/vinita/covid19_cme_analysis-master/work_dir/2021-10-18_00 48 Totally 0 sequences pass the filter of less than 50 Ns /home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh: line 92: /home/vinita/covid19_cme_analysis-master/scripts/../software/genesis/bin/apps/remove_sequences: No such file or directory Traceback (most recent call last): File "/home/vinita/covid19_cme_analysis-master/./pipeline/1_preprocess_data.py", line 14, in preprocessing.trim_separate_align(paths.raw_sequences, File "/home/vinita/covid19_cme_analysis-master/scripts/preprocessing.py", line 24, in trim_separate_align subprocess.check_call(cmd, cwd=runsdir) File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis-master/scripts/preanalysis1.sh', '/home/vinita/covid19_cme_analysis-master/work_dir/2021-10-18_00/covid_raw_unaligned.fasta', 'fmsan', '/home/vinita/covid19_cme_analysis-master/scripts', '/home/vinita/covid19_cme_analysis-master/software/mafft/mafft.bat', '/home/vinita/covid19_cme_analysis-master/config/outgroups.txt', '/home/vinita/covid19_cme_analysis-master/work_dir/2021-10-18_00', '48']' returned non-zero exit status 127.

This is how I confirmed number of N in my sequences

` N:3 A:8912 C:5478 G:5850 T:9595

hCoV-19/India/MH-NEERI-NGP-27337/2020 A:8909 C:5469 G:5853 T:9606 hCoV-19/India/MH-NEERI-NGP-27505/2020 N:181 A:8860 C:5436 G:5814 T:9547 hCoV-19/India/MH-NEERI-NGP-27621/2020 A:8910 C:5472 G:5854 T:9603 hCoV-19/India/MH-NEERI-NGP-27795/2020 N:24 A:8909 C:5468 G:5841 T:9597 hCoV-19/India/MH-NEERI-NGP-27991/2020 N:194 A:8852 C:5441 G:5821 T:9530 hCoV-19/India/MH-NEERI-NGP-28141/2020 N:2 A:8908 C:5479 G:5855 T:9595 hCoV-19/India/MH-NEERI-NGP-28251/2020 N:176 A:8858 C:5450 G:5816 T:9538 `

Now I ealised that error is kind of same if I changed the my $lessthanN to 50 because now you can see the frequency of Ns are quite random in my sequences.

Any advice or assistance would be greatly appreciated!

Vinita

vinitamehlawat commented 2 years ago

Hi @Pbdas

I have looked for /software/genesis/bin/apps/remove_sequences but there is nothing in apps folder, I further download the genesis and tried to install it using make from bin as well as from build folder both but it is showing error :

make[2]: ** [lib/genesis/CMakeFiles/genesis_lib_shared.dir/build.make:63: lib/genesis/CMakeFiles/genesis_lib_shared.dir///genesis_unity_sources/lib/all.cpp.o] Error 1 make[1]: [CMakeFiles/Makefile2:144: lib/genesis/CMakeFiles/genesis_lib_shared.dir/all] Error 2 make: * [Makefile:84: all] Error 2

I noticed that at both places i.e. in yours repo software/genesis and locally installed genesis this error is same.

It would be appreciate if you could please suggest why this genesis is not installing and is this responsible for my error in 1_preprocess_data.py

Thanks Vinita

vinitamehlawat commented 2 years ago

Hi @Pbdas

I was on leave for some time so Today I again git clone the https://github.com/BenoitMorel/covid19_cme_analysis repo, But unfortunately I am so confused this time as well when I saw same kind of error but this time NOTregarding genesis. I would really appreciate your effor if you could guide me which folder in work_dir I should use first, wether fmsao/smsan and after that how we should run the 3rd command with which subfolders in these folders.

Here I am pasting my command which I used to ran the 1st script to preprocess my raw data.

**./pipeline/1_preprocess_data.py work_dir/2021-11-15_00/smsan/**
/home/vinita/covid19_cme_analysis/scripts/preanalysis1.sh /home/vinita/covid19_cme_analysis/work_dir/2021-11-15_00/covid_raw_unaligned.fasta smsan /home/vinita/covid19_cme_analysis/scripts /home/vinita/covid19_cme_analysis/software/mafft/mafft.bat /home/vinita/covid19_cme_analysis/config/outgroups.txt /home/vinita/covid19_cme_analysis/work_dir/2021-11-15_00 48
Totally 0 sequences pass the filter of less than 10 Ns
12:50:21 INFO Specified: infile = covid_raw_unaligned_oneline_fullseq_ns.fasta
12:50:21 INFO Specified: exclude_file = /home/vinita/covid19_cme_analysis/config/outgroups.txt
12:50:21 INFO Started
12:50:21 INFO Removed 0 seqeunces.
12:50:21 INFO Finished
mv: cannot stat 'covid_raw_unaligned_oneline_fullseq_ns_tmp.fasta': No such file or directory
Traceback (most recent call last):
  File "/home/vinita/covid19_cme_analysis/./pipeline/1_preprocess_data.py", line 14, in <module>
    preprocessing.trim_separate_align(paths.raw_sequences,
  File "/home/vinita/covid19_cme_analysis/scripts/preprocessing.py", line 24, in trim_separate_align
    subprocess.check_call(cmd, cwd=runsdir)
  File "/home/vinita/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vinita/covid19_cme_analysis/scripts/preanalysis1.sh', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-15_00/covid_raw_unaligned.fasta', 'smsan', '/home/vinita/covid19_cme_analysis/scripts', '/home/vinita/covid19_cme_analysis/software/mafft/mafft.bat', '/home/vinita/covid19_cme_analysis/config/outgroups.txt', '/home/vinita/covid19_cme_analysis/work_dir/2021-11-15_00', '48']' returned non-zero exit status 1. 

Please help me to understand how I should run these scripts, where I am doing mistakes because in my preanalysis_runs folder there are three files like this and out of three one is raw fasta file other two are empty:

-rw-rw-r-- 1 vinita vinita 70M Nov 15 12:52 covid_raw_unaligned_oneline.fasta
-rw-rw-r-- 1 vinita vinita   0 Nov 15 12:52 covid_raw_unaligned_oneline_fullseq.fasta
-rw-rw-r-- 1 vinita vinita   0 Nov 15 12:52 covid_raw_unaligned_oneline_fullseq_ns.fasta

There is one more doubt, in https://github.com/BenoitMorel/covid19_cme_analysis/wiki you have mentioned that FMSAO is full MSA with output but in your paper you have mentioned that FMSAO is Full MSA with bat and pangolin Outgroups, which one we should consider.

Thank you so much for your time Vinita

pierrebarbera commented 2 years ago

Dear Vinita,

it looks like the cause of the error is that the sequences are not full length (less than 29k NT). Is it possible? Thats the only reason why the oneline_fullseq.fasta may be empty

Pierre