Magdoll / Cogent

Coding Genome Reconstruction using Iso-Seq data
BSD 3-Clause Clear License
60 stars 17 forks source link

generate_batch_cmd_for_Cogent_family issue #67

Closed qiuxx221 closed 5 years ago

qiuxx221 commented 5 years ago

Hi, I was running around 50,000 flnc.fasta file to construct gene family. After run_preCluster.py, my fasta was split into bins successfully. Then I was running generate_batch_cmd_for_Cogent_family_finding.py, it was done within a min without error code. But I checked the bins, all failed.

When I just run it as a small dataset, everything seems to be fine. I have both minimap2 and cupcake. The only thing I am guess that went wrong is I don't find the rarefaction folder in the cupcake in the github dir, so was not able to set up the path. But I just don't find the rarefaction...can you help me figure out if that is the problem?

Thanks!

Magdoll commented 5 years ago

Hi @qiuxx221 ,

It is difficult to figure out what is going on without more information. Please give me the full list of commands you used and at each step the output result (you can just do a ls -lh on the directory to show me the file and file sizes as a start)

Thanks, --Liz

qiuxx221 commented 5 years ago

Hi Liz,

Thanks for getting back to me. Here is what I did and files that I have in the folder.

  1. clean_hf_iso.fasta is the flnc.fasta file from isoseq3 pipleline after remove contamination.

-rw-r--r-- 1 qiuxx221 watkinse 134810240 May 2 09:26 clean_hf_iso.fasta

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 error

-rw-r--r-- 1 qiuxx221 watkinse 1228 May 1 15:51 FastA.N50.pl

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 10:20 Isoseq3_output

drwxr-sr-x 6 qiuxx221 watkinse 4096 May 1 15:37 old

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 out

-rw-r--r-- 1 qiuxx221 watkinse 2339749 May 1 19:22 removed_bac_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 131482 May 2 09:26 removed_vir_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 989 May 2 14:40 run_mash.sh

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 09:12 shell

drwx--S--- 5 qiuxx221 watkinse 8192 May 2 16:50 SMRT_Merge

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:38 temp_family_finding

  1. I linked the file to isoseq_flnc.fasta and ran ln -s clean_hf_iso.fasta isoseq_flnc.fasta

run_preCluster.py --cpus 24

  1. After step 2, I have files listed below

-rw-r--r-- 1 qiuxx221 watkinse 134810240 May 2 09:26 clean_hf_iso.fasta

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 error

-rw-r--r-- 1 qiuxx221 watkinse 1228 May 1 15:51 FastA.N50.pl

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 10:20 Isoseq3_output

lrwxrwxrwx 1 qiuxx221 watkinse 18 May 2 16:52 isoseq_flnc.fasta -> clean_hf_iso.fasta

drwxr-sr-x 6 qiuxx221 watkinse 4096 May 1 15:37 old

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 out

-rw-r--r-- 1 qiuxx221 watkinse 68288 May 2 16:55 preCluster.cluster_info.csv

drwxr-sr-x 9352 qiuxx221 watkinse 1277952 May 2 16:55 preCluster_out

-rw-r--r-- 1 qiuxx221 watkinse 18968 May 2 16:54 preCluster_out.chimeras.fasta

-rw-r--r-- 1 qiuxx221 watkinse 29439173 May 2 16:54 preCluster_out.orphans.fasta

-rw-r--r-- 1 qiuxx221 watkinse 1328137 May 2 16:54 preCluster.output.csv

-rw-r--r-- 1 qiuxx221 watkinse 2339749 May 1 19:22 removed_bac_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 131482 May 2 09:26 removed_vir_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 989 May 2 14:40 run_mash.sh

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 09:12 shell

drwx--S--- 5 qiuxx221 watkinse 8192 May 2 16:50 SMRT_Merge

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:38 temp_family_finding

  1. Then I ran generate_batch_cmd_for_cogent_family_finding.py

(anaCogent) -bash-4.2$ generate_batch_cmd_for_Cogent_family_finding.py --cpus=24 --cmd_filename=cmd preCluster.cluster_info.csv preCluster_out HF_test I got

-rw-r--r-- 1 qiuxx221 watkinse 134810240 May 2 09:26 clean_hf_iso.fasta

-rw-r--r-- 1 qiuxx221 watkinse 4920748 May 2 16:57 cmd

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 error

-rw-r--r-- 1 qiuxx221 watkinse 1228 May 1 15:51 FastA.N50.pl

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 10:20 Isoseq3_output

lrwxrwxrwx 1 qiuxx221 watkinse 18 May 2 16:52 isoseq_flnc.fasta -> clean_hf_iso.fasta

drwxr-sr-x 6 qiuxx221 watkinse 4096 May 1 15:37 old

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:39 out

-rw-r--r-- 1 qiuxx221 watkinse 68288 May 2 16:55 preCluster.cluster_info.csv

drwxr-sr-x 9711 qiuxx221 watkinse 1327104 May 2 16:55 preCluster_out

-rw-r--r-- 1 qiuxx221 watkinse 18968 May 2 16:54 preCluster_out.chimeras.fasta

-rw-r--r-- 1 qiuxx221 watkinse 29439173 May 2 16:54 preCluster_out.orphans.fasta

-rw-r--r-- 1 qiuxx221 watkinse 1328137 May 2 16:54 preCluster.output.csv

-rw-r--r-- 1 qiuxx221 watkinse 2339749 May 1 19:22 removed_bac_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 131482 May 2 09:26 removed_vir_cont.fasta

-rw-r--r-- 1 qiuxx221 watkinse 989 May 2 14:40 run_mash.sh

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 09:12 shell

drwx--S--- 5 qiuxx221 watkinse 8192 May 2 16:50 SMRT_Merge

drwxr-sr-x 2 qiuxx221 watkinse 4096 May 2 14:38 temp_family_finding So at this step, the HF_Test folder was not created

  1. Here is how the cmd file look like

cd /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/3

run_mash.py -k 30 --cpus=24 /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/3/isoseq_flnc.fasta

process_kmer_to_graph.py /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/3/isoseq_flnc.fasta /panfs/roc/groups/2/watkinse/qiuxx221/Res

ult_files/Isoseq/preCluster_out/3/isoseq_flnc.fasta.s1000k30.dist /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/HF_test 3

cd /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/4

run_mash.py -k 30 --cpus=24 /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/4/isoseq_flnc.fasta

process_kmer_to_graph.py /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/4/isoseq_flnc.fasta /panfs/roc/groups/2/watkinse/qiuxx221/Res

ult_files/Isoseq/preCluster_out/4/isoseq_flnc.fasta.s1000k30.dist /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/HF_test 4

cd /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/6

run_mash.py -k 30 --cpus=24 /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/6/isoseq_flnc.fasta

process_kmer_to_graph.py /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/preCluster_out/6/isoseq_flnc.fasta /panfs/roc/groups/2/watkinse/qiuxx221/Res

ult_files/Isoseq/preCluster_out/6/isoseq_flnc.fasta.s1000k30.dist /panfs/roc/groups/2/watkinse/qiuxx221/Result_files/Isoseq/HF_test 6

That is how far I am at...and not sure where it went wrong.

Thanks for the help! Yinjie

On Thu, May 2, 2019 at 4:36 PM Elizabeth Tseng notifications@github.com wrote:

Hi @qiuxx221 https://github.com/qiuxx221 ,

It is difficult to figure out what is going on without more information. Please give me the full list of commands you used and at each step the output result (you can just do a ls -lh on the directory to show me the file and file sizes as a start)

Thanks, --Liz

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Magdoll/Cogent/issues/67#issuecomment-488841310, or mute the thread https://github.com/notifications/unsubscribe-auth/AIMF6WVUXBN4R2NFU7ABBO3PTNNHHANCNFSM4HKDSTVA .

-- Yinjie Qiu Ph.D. Candidate Plant Breeding/Molecular Genetics Department of Horticultural Science https://horticulture.umn.edu/ University of Minnesota, Twin Cities Email: qiuxx221@umn.edu Cell: 605-691-4838 Website: www.turf.umn.edu

Magdoll commented 5 years ago

Hi @qiuxx221 ,

You need to run the command file cmd. Looks like you have not run it yet. Hence no results.

--Liz