MadsAlbertsen / mmgenome

Please use mmgenome2 instead. Tools for extracting individual genomes from metagenomes
https://kasperskytte.github.io/mmgenome2/
27 stars 8 forks source link

Detailed description of import file for network.pl and extract.fastq.for.reassembly.pl #42

Open chenziwu5 opened 6 years ago

chenziwu5 commented 6 years ago

Hello, thanks for these helpful scripts you provided! However, can you explain the import file for these script in detail. like,how can we get the .sam format file in these script(let the assembly file mapping to the reads from the sample using Bowtie2 or other software,it's that a right process?) How about the other parameters(-inref; -infastq...) can you give an example for me. Thank you!

Best regards

Kirk3gaard commented 6 years ago

Hi chenziwu5 I recommend that you check out the new mmgenome2 package (https://github.com/KasperSkytte/mmgenome2). It comes with better documentation and some new cool features.

You can get the sam file from mapping with BWA, but I recommend that you check out minimap2 (https://github.com/lh3/minimap2), which is much faster and also supports long reads.

The following should work minimap2 -ax sr -t $THREADS results/assembly.fasta data/SEQID.R1.trimmed.fastq data/SEQID.R2.trimmed.fastq > temp/mapping.sam perl mmgenome/scripts/network.pl -i temp/mapping.sam -f 2 -outcon network.txt

Best regards Rasmus

chenziwu5 commented 6 years ago

Hi Rasmus The reply very useful for me ! thank you!

chenziwu5 commented 6 years ago

Hi Rasmus thanks for your help! I adopt your suggestion,and got a mapping.sam file using minimap2.However,i met a new trouble,when i typed the script of network.pl. As shown below: chenziwu@chenziwu-ThinkStation-D30:~$ perl /home/chenziwu/mmgenome-master/scripts/network.pl -i D05_minimap2.sam -f 2 Use of uninitialized value in numeric ge (>=) at /home/chenziwu/mmgenome-master/scripts/network.pl line 146, line 9081001. Use of uninitialized value in numeric ge (>=) at /home/chenziwu/mmgenome-master/scripts/network.pl line 146, line 9081005. Use of uninitialized value in numeric ge (>=) at /home/chenziwu/mmgenome-master/scripts/network.pl line 146, ..... ..... ..... and there is no result in the network.txt file.can you give any suggestions for me.thank you so much! Best regards chenziwu5

------------------ Original ------------------ From: "Rasmus Kirkegaard"notifications@github.com; Date: Thu, Sep 13, 2018 03:00 PM To: "MadsAlbertsen/mmgenome"mmgenome@noreply.github.com; Cc: "chenziwu5"chenzw29@mail2.sysu.edu.cn; "Author"author@noreply.github.com; Subject: Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi chenziwu5 I recommend that you check out the new mmgenome2 package (https://github.com/KasperSkytte/mmgenome2). It comes with better documentation and some new cool features.

You can get the sam file from mapping with BWA, but I recommend that you check out minimap2 (https://github.com/lh3/minimap2), which is much faster and also supports long reads.

The following should work minimap2 -ax sr -t $THREADS results/assembly.fasta data/SEQID.R1.trimmed.fastq data/SEQID.R2.trimmed.fastq > temp/mapping.sam perl mmgenome/scripts/network.pl -i temp/mapping.sam -f 2 -outcon network.txt

Best regards Rasmus

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Kirk3gaard commented 6 years ago

Hi chenziwu5 What does the headers look like in your assembly file?

chenziwu5 commented 6 years ago

Hi Rasmus: The attachment is a screenshot of my assembly file(file name:scaffold2000.fa)

------------------ Original ------------------ From: "Rasmus Kirkegaard"notifications@github.com; Date: Fri, Sep 14, 2018 03:22 PM To: "MadsAlbertsen/mmgenome"mmgenome@noreply.github.com; Cc: "chenziwu5"chenzw29@mail2.sysu.edu.cn; "Author"author@noreply.github.com; Subject: Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi chenziwu5 What does the headers look like in your assembly file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Kirk3gaard commented 6 years ago

Hi chenziwu5 I do not think attaching stuff to the email works.

chenziwu5 commented 6 years ago

Hi Rasmus sorry,I am not reply you timely. The header of my assembly file looks like as below:

scaffolf_0 ATCGATCG..... scaffold_1 ATCGATCG scaffold_2 ATCGATCG scaffold_5 ATCGATCG .....

Because i extracted sequence longer than 2000bp so the number of the sequence is not sequential number

------------------ Original ------------------ From: "Rasmus Kirkegaard"notifications@github.com; Date: Fri, Sep 14, 2018 03:49 PM To: "MadsAlbertsen/mmgenome"mmgenome@noreply.github.com; Cc: "chenziwu5"chenzw29@mail2.sysu.edu.cn; "Author"author@noreply.github.com; Subject: Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi chenziwu5 I do not think attaching stuff to the email works.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Kirk3gaard commented 6 years ago

Hi Try cleaning the fasta headers using awk '/^>/{print ">" ++i; next}{print}' input.fasta > outputclean.fasta and redo the mapping.

chenziwu5 commented 6 years ago

Hi Rasmus: Thanks for your suggestions. but it seem occured the same problems Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70837. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70839. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70845. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70849. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70851. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70855. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70857. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70861. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70865. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70867. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70869. Use of uninitialized value in numeric ge (>=) at network.pl line 146, line 70871. 。。。。。。

------------------ Original ------------------ From: "Rasmus Kirkegaard"notifications@github.com; Date: Mon, Sep 17, 2018 03:39 PM To: "MadsAlbertsen/mmgenome"mmgenome@noreply.github.com; Cc: "chenziwu5"chenzw29@mail2.sysu.edu.cn; "Author"author@noreply.github.com; Subject: Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi Try cleaning the fasta headers using awk '/^>/{print ">" ++i; next}{print}' input.fasta > outputclean.fasta and redo the mapping.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Kirk3gaard commented 6 years ago

Hi chenziwu5 And still no output file?

chenziwu5 commented 6 years ago

it output a ampty network.txt file.the terminal interface also output the result like before.This make me feel puzzled.And,I checked the head of assembly file,it conforms to the specified format(>1,>2,>3…) --------------原始邮件-------------- 发件人:"Rasmus Kirkegaard "notifications@github.com; 发送时间:2018年9月17日(星期一) 晚上10:19 收件人:"MadsAlbertsen/mmgenome" mmgenome@noreply.github.com; 抄送:"chenziwu5 "chenzw29@mail2.sysu.edu.cn;"Author "author@noreply.github.com; 主题:Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi chenziwu5 And still no output file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

chenziwu5 commented 6 years ago

Hi Rasmus: I tried to change a the assembly file from another sample assembly.and then follow the same protocol as before.In this time, the terminal interface did not report errors.But I just got a empty network file as below: the file only output a header: scaffold1 scaffold2 connections

so, do you have any suggestion for this problem? why does the script work when i changed anuther assembly file. Do i need adjust the parameters for the script? anyway,thank you so so much for your help! Best regards chenziwu

------------------ Original ------------------ From: "Rasmus Kirkegaard"notifications@github.com; Date: Mon, Sep 17, 2018 10:19 PM To: "MadsAlbertsen/mmgenome"mmgenome@noreply.github.com; Cc: "chenziwu5"chenzw29@mail2.sysu.edu.cn; "Author"author@noreply.github.com; Subject: Re: [MadsAlbertsen/mmgenome] Detailed description of import file fornetwork.pl and extract.fastq.for.reassembly.pl (#42)

Hi chenziwu5 And still no output file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.