claczny / VizBin

Repository of our application for human-augmented binning
27 stars 14 forks source link

Annotation file #42

Closed ShrutiKetan closed 5 years ago

ShrutiKetan commented 5 years ago

Annotation file After maxbin, the binned files were concatenated into one to be an input file for Vizbin. Headers being contig ids. (eg >k123, >k99, etc) Annotation file was made starting with labels as the first line and then 1-11 as categorical variables. But after multiple tries, it still gives just blue and red color. Can you please help me figure it out ?

Thank you very much

claczny commented 5 years ago

HI @ShrutiKetan,

could you please paste the first 20-or-so lines of your annotation file?

Thank you!

ShrutiKetan commented 5 years ago

Yes, sure.

  1. annotation file label 001 001 001 001 001 001 001 001 001 001 001 001 001 001 001 001 001 001 001

  2. contig.fasta

    k91_90 TCGATATATCCAACCCACGACTAATACATATACCGCTAGCGACACAAATGGATACCATACTTCGAAAATG GTGAGTACTCGTTCTTCTAGGTCAAGACACTACTCAATCCTGTTATCGTAGTTGATGCAATCATTTTTCT CGCACCTCTTCTCTACACTCAGACGCTTGCCGAAGATCCAACCGTTCCACGAGTTGTTGATTCTTTACAG CTTTTCTCTGAAGTTGTCTCACATCCATTACTCTCACGAGCCTCGATTGTGCTATTCTTGAATAAGCTCG ATCTTACTGCTCGCGCTATCAAACAGGGAAAACGAGTCTCCGATTACTTAAGAGGTTATCCGAAGGATGA AAGTGAAAATAATATTTCAAGTCTTGTTAAAGGTAGATCAAAGTTCATCATGTTCGATACCTCATGACTA ACGTCATCTCTTTCAGCTTTTCGAACAAAGTACAAATCTATCTACAAGACCCTTTCACCCCCTCACAGGA TGTTCTTTTGCCATGAAACAAGCGTAATCGATTCAAA k91_113 CTTTTACAGAAACTTACATAAATTCAGGAATATAACGGCATCTTTGGGGAATAACACGCCATCAATGATT TCGTCTTTGGACATCGCGTGGGGTAGACCAAGTGGACCAACAGGTCTGAATCGACTGCACTAAATTAACA CATCAGCTTTTGCATGCTGACTACATTACTATGACTATACCTCTTGAATTACGGCCTGTGTATACGGTAG ATTAGGTAAGTCTTCCCATTGTGGAACCCTGTCCGAGCCGATAACACGATCAATCTCTTCTTGAGCCTTT CGCTGCACATCAGGATGACACGCGAGAACTAGAATGGCGCTTTGCAGGGTGGCTGAAGACGTGTCTGAAC CCTCCAAAAGCACACCTCCGAGGTTGCTTGTTCTCACATCAGAAAATAGGCGCGGATTTGGTAAAAAGAT GCTTACAATAAATGCTCGCGGCTAGTAAGCCCCCATTCGTTTGAATTTTTGATTGCCTGCTCCATGAAAC AGCCGTTGCCCATATTTTTCTCGAGTCTTTTCTCGACATGTTCTAGAAGACGTCCGTAAAGCTTCTCATG GAGAGCTATTACCCTCTTTATTTGTCGTTTCCATTTGGCC

Thank you for the prompt reply.

Regards, Shruti Kutmutia B3, SCELSE


From: Cedric Laczny [notifications@github.com] Sent: Thursday, April 25, 2019 3:18 PM To: claczny/VizBin Cc: Kutmutia Shruti Ketan; Mention Subject: Re: [claczny/VizBin] Annotation file (#42)

HI @ShrutiKetanhttps://github.com/ShrutiKetan,

could you please paste the first 20-or-so lines of your annotation file?

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/claczny/VizBin/issues/42#issuecomment-486549195, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AL5ILK3T4VKRY7VLZJJ6XBTPSFLLDANCNFSM4HIKLGTQ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

claczny commented 5 years ago

Hmmm... 🤔

The first lines look about right...

Some thoughts:

Alternatively, I would be happy to have a look at your files, if it would be possible for you to share them confidentially.

Best,

Cedric

ShrutiKetan commented 5 years ago

Hi Cedric,

  1. Yes the number of lines in annotation file is no. of contigs + 1
  2. uniq -c annotation.list 1 label 32444 001 79694 002 37980 003 22250 004 61323 005 22297 006 70103 007 105050 008 23495 009 12560 010 11892 011

Are all your contigs >= minimum length you specify in VizBin (default: >= 1,000 nt)? About this, not really! I have varied lengths. I am sorry i assumed than Vizbin would just ignore contigs below the set threshold.

I have attached an image of what i see in the visualization window.

Regards, Shruti Kutmutia B3, SCELSE


From: Cedric Laczny [notifications@github.com] Sent: Thursday, April 25, 2019 4:55 PM To: claczny/VizBin Cc: Kutmutia Shruti Ketan; Mention Subject: Re: [claczny/VizBin] Annotation file (#42)

Hmmm... 🤔

The first lines look about right...

Some thoughts:

001

002

003

...

...

011

label

Alternatively, I would be happy to have a look at your files, if it would be possible for you to share them confidentially.

Best,

Cedric

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/claczny/VizBin/issues/42#issuecomment-486580070, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AL5ILK652ZFQKALQJE5ZJKDPSFWXVANCNFSM4HIKLGTQ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

claczny commented 5 years ago

Are all your contigs >= minimum length you specify in VizBin (default: >= 1,000 nt)? About this, not really! I have varied lengths. I am sorry i assumed than Vizbin would just ignore contigs below the set threshold.

VizBin does ignore contigs below the threshold. However, when using an annotation file, the sequences are ignored, but not the respective annotations. This could be improved for sure in VizBin.

Hence, your issue might be due to this. To overcome this, you might

  1. filter the individual MaxBin bins according to length (e.g., >= 1,000 nt)
  2. create per-bin annotation files, i.e., containing you categorical label (no header) as often as there are sequences in the filtered bin
  3. concatenate the filtered bins -> contigs.size_selected.fa
  4. concatenate the annotation files -> annotation.csv
  5. put a "label" header into annotation.csv

I have attached an image of what i see in the visualization window.

I could unfortunately not see it attached. I assume this is because you replied via email and maybe GitHub's Issue system does not support this. Feel free to add the image to this issue (https://github.com/claczny/VizBin/issues/42).

Depending on the size of contig.fa you might also give BusyBee Web a try. It complements some of the shortcomings of VizBin, it is web-based and thus has no user-level dependencies, but it provides less flexibility and has a maximum upload limit.

Hope this helps.

If you issue is solved, please do not forget to close it. Should you have further questions, kindly let me know.

Best,

Cedric

ShrutiKetan commented 5 years ago

Hi Cedric, Everything works well now. Just a trivial question. The metagenomic binning of a sample with a lot of fungi, the bins arent evident at all. Can yo please suggest a batter way to do this! Thank you so much.

Regards, Shruti Kutmutia B3, SCELSE


From: Cedric Laczny [notifications@github.com] Sent: Thursday, April 25, 2019 7:19 PM To: claczny/VizBin Cc: Kutmutia Shruti Ketan; Mention Subject: Re: [claczny/VizBin] Annotation file (#42)

Are all your contigs >= minimum length you specify in VizBin (default: >= 1,000 nt)? About this, not really! I have varied lengths. I am sorry i assumed than Vizbin would just ignore contigs below the set threshold.

VizBin does ignore contigs below the threshold. However, when using an annotation file, the sequences are ignored, but not the respective annotations. This could be improved for sure in VizBin.

Hence, your issue might be due to this. To overcome this, you might

  1. filter the individual MaxBin bins according to length (e.g., >= 1,000 nt)
  2. create per-bin annotation files, i.e., containing you categorical label (no header) as often as there are sequences in the filtered bin
  3. concatenate the filtered bins -> contigs.size_selected.fa
  4. concatenate the annotation files -> annotation.csv
  5. put a "label" header into annotation.csv

I have attached an image of what i see in the visualization window.

I could unfortunately not see it attached. I assume this is because you replied via email and maybe GitHub's Issue system does not support this. Feel free to add the image to this issue (#42https://github.com/claczny/VizBin/issues/42).

Depending on the size of contig.fa you might also give BusyBee Webhttps://ccb-microbe.cs.uni-saarland.de/busybee a try. It complements some of the shortcomings of VizBin, it is web-based and thus has no user-level dependencies, but it provides less flexibility and has a maximum upload limit.

Hope this helps.

If you issue is solved, please do not forget to close it. Should you have further questions, kindly let me know.

Best,

Cedric

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/claczny/VizBin/issues/42#issuecomment-486631579, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AL5ILK3RAWKUQXPE6BJCN7TPSGHUBANCNFSM4HIKLGTQ.


CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents. Towards a sustainable earth: Print only when necessary. Thank you.

claczny commented 5 years ago

Hi @ShrutiKetan,

that's great to hear!

Could you please share briefly how you solved your issue? I will also close the issue, but feel free to post related questions after that, if needed.

Regarding binning of fungal genomes, I have to admit that I have not much experience in this. I know from experiences shared by others that fungal bins frequently separate rather clearly from bacterial bins, e.g., in the case of bacterial "symbionts" in/on fungi, but in these cases, the fungal organism was the host.

I assume that in your case, you have some microbiome derived from some "complex" host/environment (e.g., from an insect, mammal, or waste water)? It also depends on the complexity/diversity of the microbiome. Unfortunately (or maybe fortunately, depending on one's point-of-view), there exists currently no single-best binning solution, i.e., one should try multiple tools. This is also why "binning consolidators", such as DAS tool (https://www.nature.com/articles/s41564-018-0171-1) have been developed.

Best,

Cedric