claczny / VizBin

Repository of our application for human-augmented binning
27 stars 14 forks source link

Annotation file: Label (in numeric form) not displaying default color options #53

Closed jrdickey9 closed 1 year ago

jrdickey9 commented 1 year ago

Howdy there,

I am using VizBin to visualize and manually bin mags for host associated bacterial populations. I have a single .fasta file that contains scaffolds corresponding to samples. In other words, multiple sample .fasta files were combined into a single .fasta file so that I can bin genomes across all samples of interest. My goal here was to create an annotation file to reflect this. Each sample, or label (#1-9), corresponding to a color. Any color, really doesn't matter. I created the annotation file from the combined fasta file in efforts to maintain the scaffold order and find other interesting properties such as length and gc content.

The issue I am having is that the annotation file is working, I think, but the labels are not being read. To explain further, it appears that the size of each point is changing based on length. That is helpful somewhat.

In the future I would like to add a reference genome of my "bacteria of interest" as a marker to aid my binning (receive more complete bins and avoid contamination as much as possible). I have yet to add this to my annotation file since the labels aren't being read.

Beyond that -- I have MANY scaffolds that I am inputting into VizBin (>4mil). I have set minimum contig length to 2Kb or 3Kb. The annotation file and fasta contain the same number of entries prior to input into VizBin. The minimum contig length does toss out plenty of scaffolds, but not so much that only one label is left.

Below is the head and tails of both my annotation file and the .fasta file.

A) head annotation file label,length,gc 1,134077,45.42 1,87175,45.16 1,65686,45.71 1,52865,45.92 1,44948,34.86 1,42530,45.30 1,42475,46.38 1,40293,45.94 1,29404,48.00

B) tail annotation file 9,200,56.50 9,200,55.50 9,200,60.00 9,200,53.00 9,200,52.50 9,200,53.00 9,200,42.00 9,200,42.50 9,200,34.00 9,200,57.00

C) head fasta file

D0_SEK2_2_scaffold_1_c1 CAATCGATACGACCCCGGAGAGCGGCTTTTGCTAAAACTCGAGCAGTTTCTTGAAAACTT GCTTCTGATATGAAACTTTGAGTATTTAGAGATGCTTTCGTTATTCCCAATAAGATGGCT CGATAACAGATCGCTTCTTCCAAAGCACGCCCTGTTCGTTCCGCCCGCAACAATTCAATT AGTTCTCCGGGTGAAAAAACATTAGACATTCTATCTTCTGAAACCAACACTTTTGATGTT ATTTGACGCACAATAATCTCTATATGTCGATTATGAATCTGCACTCCCTGAGATCGATAA ACTTTTTGGATCTTATTAACCAAAGAGATACGACTTTGCACTATAGTTAGCTCAGCACCA ATCAAGAATCCCCAAGGAATTCCAAGAATTTTTGCTATACGCTCGTTCCAACCCTCAATC CTCTTTTCTAGGTTCATCGATATTGAATCAATCGAACGAACTTCTAACACTTGTTCCACT TTTGGAAGACCTTGCGTTATATCTCCAGATCTCTATTTTTCATATATAAATGTAACTAAC

D) tail fasta file

W_SEK2_D15_scaffold_211474_c1 ATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTCATTACACTTTATA TGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTTGCAATAAACCTTC ATTACACTTTATATGATAGCACGACCATATTCCACTCCCATATCTGTGTTATTATTCGTT GCAATAAACCTTCATTACAC W_SEK2_D15_scaffold_113313_c1 CAGCAGACCGTGATGTCTTACGCCTGTGTTGCCCTCTACCGCTATGCGGTTGGTAAGCCA GTGCCAGGGTTCGACCCAACGGCTATGCAGGGAGCGTTCCGAGTGAAGAAGCAGAAGTTC ACCGGACAAGCCGGAGCCTAATTAGCGCCTAGGGCCACTCCGCGAACGAGAGCCTTCTGG AAGTTCAGGTAAATGAACAC

note: D0_SEK2_2 is a sample name that I replaced with 1 in the label column of the annotation file. I thought potentially this software wasn't reading the labels correctly due to the underscores or the combination of letters and numbers. However, when it is just numbers, I am failing to get anything.

Any help would be great,

Jonathan Post Doc, UCSD

jrdickey9 commented 1 year ago

PS: Here is the png from the output with a csv formatted annotation file and the fasta input.

Screenshot 2023-04-17 at 3 54 45 PM
jrdickey9 commented 1 year ago

Resolved - size filter fasta file before input into VizBin. Make annotation file from size selected fasta. Selected same size filter and proceed.

claczny commented 1 year ago

Hi Jonathan,

thank you for the issue and great to see that you have been able to resolve it. I was off for a week, so could only reply now.

Indeed, this is the way that I'd have suggested to you too. It is a point where the UX could surely be improved, but, unfortunately, I currently do not have resources available that I could dedicate to this.

As you mentioned this to be host-associated, my "suspicion" is that the big cluster in the middle might be genomic fragments from the host. Or maybe the distorted "C" shape cluster at 12 o'clock 🤔 Unless you filtered out reads prior already, than this is a different story 😉

Should you have further questions, please do not hesitate to ask.

Best wishes and stay safe,

Cedric

jrdickey9 commented 1 year ago

Thanks Cedric!

Cheers, J

On Apr 24, 2023, at 2:40 AM, Cedric Laczny @.***> wrote:

Hi Jonathan,

thank you for the issue and great to see that you have been able to resolve it. I was off for a week, so could only reply now.

Indeed, this is the way that I'd have suggested to you too. It is a point where the UX could surely be improved, but, unfortunately, I currently do not have resources available that I could dedicate to this.

As you mentioned this to be host-associated, my "suspicion" is that the big cluster in the middle might be genomic fragments from the host. Or maybe the distorted "C" shape cluster at 12 o'clock 🤔 Unless you filtered out reads prior already, than this is a different story 😉

Should you have further questions, please do not hesitate to ask.

Best wishes and stay safe,

Cedric

— Reply to this email directly, view it on GitHub https://github.com/claczny/VizBin/issues/53#issuecomment-1519750919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AINUKN4EEKE6OYIRYYCX5ULXCZDCFANCNFSM6AAAAAAXBZ2LKA. You are receiving this because you modified the open/close state.