claczny / VizBin

Repository of our application for human-augmented binning
27 stars 14 forks source link

where to find the log file? #31

Closed yoyohashao closed 8 years ago

yoyohashao commented 8 years ago

Hi I am very new to this software. I asked the question because when I tried to add the annotation, there was an error and I couldn't locate the log file in my macbook.

Also I wanna know if I can just label the 1st sequence in my file? I made a quite simple annotation file with only two lines : isMarker 1

Could this annotation file cause the error? I am not a man with lots of coding experience. So I decided to not fill the series of 0 after the 1.

Many thanks

claczny commented 8 years ago

Hi,

thank you for your interest in VizBin!

Could you maybe provide a screenshot or write the text that the error message states? Otherwise, I am a bit at a loss as it could be pretty much anything :) Otherwise, the log-file is to be found at $HOME/.vizbin/logs/lcsb-vizbin.log for Mac OS X and for Linux. It should be at C:\Users\YOURUSERNAME.vizbin\logs\lcsb-vizbin.log for Windows (if I remember correctly), where you have to replace YOURUSERNAME by your actual username.

Regarding the annotation file, VizBin requires an annotation for every sequence and in the order of the sequences. So, in your case, the first sequence in the FASTA file would represent the marker sequence and would be highlighted by a star-like shape in the VizBin plot. Did I get this right?

If all subsequent sequences are not supposed to be highlighted, you will still have to specify a 0 for each of the sequences in the FASTA file. Except of course for the first one which already is your marker sequence and which, as you already did, would have to be specified by a 1.

As a worked example, assuming you have 5 sequences in your FASTA file with all sequences above or equal to your desired length threshold, and you would like to highlight the first sequence, the annotation file would have to look like the following:

isMarker
1
0
0
0
0

Would it be, for whatever reason, the last sequence you would like to see highlighted, the annotation file would have to look like the following:

isMarker
0
0
0
0
1

Please let me know if this solves your issue or if you have further questions.

Best,

Cedric

yoyohashao commented 8 years ago

Hi Cedric,

following your advice, vizbin gives no error this time. But in the visualisation, there's no marker.... my annotation file should tell it that the first sequence needs a marker. also, is there any setting that can increase the separation of clusters in the visualisation?

thanks!

Fang Liu PhD Candidate Rm. 3-410 School of Life Sciences and Biotechnology Shanghai Jiaotong University Shanghai,China 200240 https://www.researchgate.net/profile/Fang_Liu37

On Sep 24, 2015, at 16:17, Cedric Laczny notifications@github.com wrote:

Hi,

thank you for your interest in VizBin!

Could you maybe provide a screenshot or write the text that the error message states? Otherwise, I am a bit at a loss as it could be pretty much anything :)

Regarding the annotation file, VizBin requires an annotation for every sequence and in the order of the sequences. So, in your case, the first sequence in the FASTA file would represent the marker sequence and would be highlighted by a star-like shape in the VizBin plot. Did I get this right?

If all subsequent sequences are not supposed to be highlighted, you will still have to specify a 0 for each of the sequences in the FASTA file. Except of course for the first one which already is your marker sequence and which, as you already did, would have to be specified by a 1.

As a worked example, assuming you have 5 sequences in your FASTA file with all sequences above or equal to your desired length threshold, and you would like to highlight the first sequence, the annotation file would have to look like the following:

isMarker 1 0 0 0 0 Would it be, for whatever reason, the last sequence you would like to see highlighted, the annotation file would have to look like the following:

isMarker 0 0 0 0 1 Please let me know if this solves your issue or if you have further questions.

Best,

Cedric

— Reply to this email directly or view it on GitHub https://github.com/claczny/VizBin/issues/31#issuecomment-142851583.

yoyohashao commented 8 years ago

Also I'd like to report another issue:

this happened when i tried to open a newly saved workspace. Last night this function was ok. Additionally, I found the cluster visualisation in demo was multi-colour. How to assign different colours? I thought there was no colour option in annotation file.

Thanks.

Fang Liu PhD Candidate Rm. 3-410 School of Life Sciences and Biotechnology Shanghai Jiaotong University Shanghai,China 200240 https://www.researchgate.net/profile/Fang_Liu37

On Sep 24, 2015, at 16:17, Cedric Laczny notifications@github.com wrote:

Hi,

thank you for your interest in VizBin!

Could you maybe provide a screenshot or write the text that the error message states? Otherwise, I am a bit at a loss as it could be pretty much anything :)

Regarding the annotation file, VizBin requires an annotation for every sequence and in the order of the sequences. So, in your case, the first sequence in the FASTA file would represent the marker sequence and would be highlighted by a star-like shape in the VizBin plot. Did I get this right?

If all subsequent sequences are not supposed to be highlighted, you will still have to specify a 0 for each of the sequences in the FASTA file. Except of course for the first one which already is your marker sequence and which, as you already did, would have to be specified by a 1.

As a worked example, assuming you have 5 sequences in your FASTA file with all sequences above or equal to your desired length threshold, and you would like to highlight the first sequence, the annotation file would have to look like the following:

isMarker 1 0 0 0 0 Would it be, for whatever reason, the last sequence you would like to see highlighted, the annotation file would have to look like the following:

isMarker 0 0 0 0 1 Please let me know if this solves your issue or if you have further questions.

Best,

Cedric

— Reply to this email directly or view it on GitHub https://github.com/claczny/VizBin/issues/31#issuecomment-142851583.

claczny commented 8 years ago

Good to hear that the error does no longer appear.

Regarding your questions:

But in the visualisation, there's no marker....

It could be that you ran into an exceptional case. To make a long story short, it might be that your marker (star-like shape in black colour) is hidden underneath non-marker points. The default behaviour of the plotting is to plot points one after another and if they overlap, the "last"-plotted point will be displayed on top.

As a workaround, if possible, please put your marker sequence at the end of your FASTA file and adjust the annotation file accordingly, see also my first post.

also, is there any setting that can increase the separation of clusters in the visualisation? The two-dimensional embedding is defined by BH-SNE and aims at faithfully reproducing the neighbourhood structure in the high-dimensional space. As such the distances are all relative. There is no easy way to increase the separation unfortunately.

As a workaround, you could try using longer sequences, e.g., >= 2,000 nt. We typically see improved cluster separation in these cases as, informally said, the longer the contig, the better the consensus. Please note that this is an empirically motivated statement and may not be true in some cases. We find that it generally holds largely true though. Also, the fewer sequence, the smaller the diameter of the clusters as fewer points need to be positioned into one plot.

this happened when i tried to open a newly saved workspace. Last night this function was ok.

My apologies but I do not see any problem stated here. Could you please elaborate?

Additionally, I found the cluster visualisation in demo was multi-colour. How to assign different colours? I thought there was no colour option in annotation file.

Various colours and shapes are used when specifying "label" annotations. For example, your annotation file could look like this:

label
A
A
A
B
B

which would then represent the first three contigs as of label "A" and the last two contigs as of label "B". VizBin cycles through colours and shapes to allow usage of multiple labels but using too many would result in poor plots. After all, how many distinct colour-shape combinations can one easily differentiate ;) Typically this works fine for something around 5 - 10 colours. Along this, please also note Issue #22.

Also, it does not matter what your labels are called, meaning that "A" could also be "foo" and "B" could be "bar", or numerical representations. These are simply supposed to be from a reasonably "small" set of categorical variables.

Again, kindly let me know if this solves your issues. Thank you.

yoyohashao commented 8 years ago

Thanks for the prompt reply Cedric. About that unseen marker, I added 4 more but they all invisible...

And about that work space issue, I opened a new post.

I will try to use longer sequences and see if I can find the stars or other label then.

Best,

Fang Liu PhD Candidate Rm. 3-410 School of Life Sciences and Biotechnology Shanghai Jiaotong University Shanghai,China 200240 https://www.researchgate.net/profile/Fang_Liu37

On Sep 24, 2015, at 19:29, Cedric Laczny notifications@github.com wrote:

Good to hear that the error does no longer appear.

Regarding your questions:

But in the visualisation, there's no marker....

It could be that you ran into an exceptional case. To make a long story short, it might be that your marker (star-like shape in black colour) is hidden underneath non-marker points. The default behaviour of the plotting is to plot points one after another and if they overlap, the "last"-plotted point will be displayed on top.

As a workaround, if possible, please put your marker sequence at the end of your FASTA file and adjust the annotation file accordingly, see also my first post.

also, is there any setting that can increase the separation of clusters in the visualisation? The two-dimensional embedding is defined by BH-SNE and aims at faithfully reproducing the neighbourhood structure in the high-dimensional space. As such the distances are all relative. There is no easy way to increase the separation unfortunately.

As a workaround, you could try using longer sequences, e.g., >= 2,000 nt. We typically see improved cluster separation in these cases as, informally said, the longer the contig, the better the consensus. Please note that this is an empirically motivated statement and may not be true in some cases. We find that it generally holds largely true though. Also, the fewer sequence, the smaller the diameter of the clusters as fewer points need to be positioned into one plot.

this happened when i tried to open a newly saved workspace. Last night this function was ok.

My apologies but I do not see any problem stated here. Could you please elaborate?

Additionally, I found the cluster visualisation in demo was multi-colour. How to assign different colours? I thought there was no colour option in annotation file.

Various colours and shapes are used when specifying "label" annotations. For example, your annotation file could look like this:

label A A A B B which would then represent the first three contigs as of label "A" and the last two contigs as of label "B". VizBin cycles through colours and shapes to allow usage of multiple labels but using too many would result in poor plots. After all, how many distinct colour-shape combinations can one easily differentiate ;) Typically this works fine for something around 5 - 10 colours. Along this, please also note Issue #22 https://github.com/claczny/VizBin/issues/22.

Again, kindly let me know if this solves your issues. Thank you.

— Reply to this email directly or view it on GitHub https://github.com/claczny/VizBin/issues/31#issuecomment-142902139.

claczny commented 8 years ago

Did you try to put the sequence(s) as well as the label(s) at the ends of the respective files instead of at the beginning?

yoyohashao commented 8 years ago

Hello Cedric, if I only picked sequences longer than 2000nt, I could finally got a nice figure. And the stars finally came up.

Now I have another question: normally we use the presence of essential genes to infer the completeness of a bin. The more essential genes it contains, the more complete it is. Yet I noticed in the paper of VizBin, all the genomes derived from vizbin contained less than 50 of 107 essential genes. Would you like to talk a bit on this? Personally, I think it's about the library construction ,the sequencing depth of metagenome, the quality of assembly, and etc. Did the reviewer asked about the completeness of VizBin clusters?

Many thanks. Best,

Fang Liu PhD Candidate Rm. 3-410 School of Life Sciences and Biotechnology Shanghai Jiaotong University Shanghai,China 200240 https://www.researchgate.net/profile/Fang_Liu37 https://www.researchgate.net/profile/Fang_Liu37

On Sep 24, 2015, at 22:56, Cedric Laczny <notifications@github.com mailto:notifications@github.com> wrote:

Did you try to put the sequence(s) as well as the label(s) at the ends of the respective files instead of at the beginning?

— Reply to this email directly or view it on GitHub https://github.com/claczny/VizBin/issues/31#issuecomment-142953902.

claczny commented 8 years ago

if I only picked sequences longer than 2000nt, I could finally got a nice figure

Good to hear that. I will consider this issue closed but discussion can continue.

Did you apply any size selection out of VizBin when using the labels? If so, this is something we are working on, see Issue #3. In summary, when using annotation information, the sequence file should only contain the sequences of interest, i.e., it should already be size-selected when loading into VizBin. The annotation file must match the sequence file in terms of number and order of the sequences. Accordingly, please make sure that this is the case as currently no automated checks are integrated.

all the genomes derived from vizbin contained less than 50 of 107 essential genes

In the VizBin manuscript, we used publicly available data and, thus, had no influence on the sequencing depth, assembly, etc. You are right that several factors can influence the quality of the reconstruction of population-level genomes from metagenomic data. It is absolutely expected that there will be some genomes with high degrees of completeness, while the majority will be incomplete unfortunately. You might also want to have a look at http://www.nature.com/articles/srep04516 (the manuscript describing the methodology underlying VizBin), specifically http://www.nature.com/articles/srep04516/tables/2, to see that indeed highly complete genomes can be recovered, as well as http://www.nature.com/ncomms/2014/141126/ncomms6603/full/ncomms6603.html for more on this. The assemblies have to be "good" in the first place as otherwise there is no chance of achieving a "good" genomic recovery, with VizBin or any other tool :)

Internally, we routinely assess the completeness of clusters as we are typically interested in recovered genomes that should be as complete as possible. In particular for functional analysis and interpretation it is important to know if one has well "sampled" a population-level genome or not, as otherwise any interpretation might simply be based on incomplete information.