Closed ga23981 closed 5 years ago
Using -C option also resulted in same error.
Hi, can you please your pangenome_matrix_t0.tab file? Thanks
Thank you, Please find attached my matrix file.
Gaurav
On Thu, Sep 12, 2019 at 4:36 AM brunocontrerasmoreira < notifications@github.com> wrote:
Hi, can you please your pangenome_matrix_t0.tab file? Thanks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6INIVZWVMSWS4P7U5ELQJH5QRA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RDUCI#issuecomment-530725385, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKQNKC3HTVMX4QYC53QJH5QRANCNFSM4IVYFY3A .
Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?
Please share your personal e-mail address.
Thank you Gaurav
On Thu, Sep 12, 2019 at 11:24 AM brunocontrerasmoreira < notifications@github.com> wrote:
Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IPH2ANWTJY265J2EZ3QJJNLXA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SIYJA#issuecomment-530877476, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKV3EPYGO4M7APLQDTQJJNLXANCNFSM4IVYFY3A .
Check it out at https://aem.asm.org/content/79/24/7696.long
https://drive.google.com/open?id=1oRH9oGT7saiDiZdh-Dr81ddlGC7Nj0uh
Above is the link to the file. Please have a look.
Thank you Gaurav
On Thu, Sep 12, 2019 at 11:30 AM Gaurav Agarwal gaurav.iari@gmail.com wrote:
Please share your personal e-mail address.
Thank you Gaurav
On Thu, Sep 12, 2019 at 11:24 AM brunocontrerasmoreira < notifications@github.com> wrote:
Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IPH2ANWTJY265J2EZ3QJJNLXA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SIYJA#issuecomment-530877476, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKV3EPYGO4M7APLQDTQJJNLXANCNFSM4IVYFY3A .
Thanks, I had a look to your file and the script and they seem just fine. The problem is your feeding the wrong matrix to the script, as that script takes square matrices. I quote from the manual:
"plot_matrix_heatmap.sh calculates ordered heatmaps with attached row and column dendrograms from squared tab-separated numeric matrices, which can be presence/absence pangenomic matrices or similarity / identity matrices as those produced by get_homologues with flag -A. From the latter type of matrix a distance matrix can optionally be calculated to drive a neighbor joining tree. See example on section 4.8.1."
Perhaps what you really want is another script that will take you tab matrix just fine:
"hcluster_pangenome_matrix.sh generates a distance matrix out of a tab-separated presence/absence pangenome matrix, which is then used to call R functions hclust() and heatmap.2() in order to produce a heatmap. "
Good luck, Bruno
Hello Bruno,
Thank you for your response.
I tried what you suggested and got some plots of heat maps and phylogenetic tree. These are individual plots but not what is shown in Fig.12 in the manual. I used the hcluster_pangenome_matrix.sh with pangenome_matrix_t0.tab file. I wanted to plot a combined tree and heat map as shown in fig 12 which I believe is based on the core genome and accessory genome containing presence and absence variants. Correct me if I am wrong.
For my understanding could you please help me with what is mentioned in Fig. 12 in the manual.
A complementary view of the same data con be obtained with script plot_matrix_heatmap.sh, which was called to produce Figure 12 http://eead-csic-compbio.github.io/get_homologues/manual/#fig:panheatmap
./plot_matrix_heatmap.sh -i sample_intersection/pangenome_matrix_t0.tab -o pdf \
-r -H 8 -W 14 -m 28 -t "sample pangenome (clusters=180)" -k "genes per cluster"
Heatmap of the previous pangenome matrix, with dendrograms sorting genomes according to cluster occupancy.
Thank you Gaurav
On Fri, Sep 13, 2019 at 7:09 AM eead-csic-compbio notifications@github.com wrote:
Thanks, I had a look to your file and the script and they seem just fine. The problem is your feeding the wrong matrix to the script, as that script takes square matrices. I quote from the manual:
"plot_matrix_heatmap.sh calculates ordered heatmaps with attached row and column dendrograms from squared tab-separated numeric matrices, which can be presence/absence pangenomic matrices or similarity / identity matrices as those produced by get_homologues with flag -A. From the latter type of matrix a distance matrix can optionally be calculated to drive a neighbor joining tree. See example on section 4.8.1."
Perhaps what you really want is another script that will take you tab matrix just fine:
"hcluster_pangenome_matrix.sh generates a distance matrix out of a tab-separated presence/absence pangenome matrix, which is then used to call R functions hclust() and heatmap.2() in order to produce a heatmap. "
Good luck, Bruno
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IOADV4JO4GRYABG7GTQJNYE5A5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UWU4I#issuecomment-531196529, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IIKZPXP6MZNGR346O3QJNYE5ANCNFSM4IVYFY3A .
Hello Gaurav, thanks for your detailed explanation. I have now tried to replicate the command to create fig12 in the manual with your pangenome file. What I found is an issue that Felipe Lira had raised some time ago and that we had not taken care of yet. It is the fact that pangenome files .tab created with compare_clusters.pl contain a trailing tab (\t) at the end of each line. I will correct that last week. In the meantime you can do this
$ cut -f 1-14453 pangenome_matrix_t0.tab > pangenome_matrix_t0.fix.tab
and use the new file to call the script as explained in the manual. As your matrix has 14K columns
the resulting figure is not particularly nice. I would suggest to select a subset of columns, perhaps removing all redundant columns,
Bruno
Many thanks, Bruno. Just one clarification requested. When you say redundant columns do you mean columns with same gene ids header?
Gaurav
On Fri, 13 Sep 2019, 5:07 pm eead-csic-compbio, notifications@github.com wrote:
Hello Gaurav, thanks for your detailed explanation. I have now tried to replicate the command to create fig12 in the manual with your pangenome file. What I found is an issue that Felipe Lira had raised some time ago and that we had not taken care of yet. It is the fact that pangenome files .tab created with compare_clusters.pl contain a trailing tab (\t) at the end of each line. I will correct that last week. In the meantime you can do this
$ cut -f 1-14453 pangenome_matrix_t0.tab > pangenome_matrix_t0.fix.tab
and use the new file to call the script as explained in the manual. As your matrix has 14K columns the resulting figure is not particularly nice. I would suggest to select a subset of columns, perhaps removing all redundant columns, Bruno
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IMNLOHO6NMXL7BOOYTQJP6ITA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6WGG3I#issuecomment-531391341, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IOTGZ5RNRQXICH6TN3QJP6ITANCNFSM4IVYFY3A .
Good morning, I was referring to columns with identical patterns of gene absence. That will make your matrix narrower, and thus the heatmap smaller, but I understand you might not want that, Bruno
I have updated compare_clusters.pl, you can update it with git pull. Let me know if you you have any trouble, Bruno
Thank you, Bruno!
Could you please help me with this one now: I need to plot core-genome and pan-genome plots as shown in Fig. 16 of the manual. In order to get these plots I am executing the following script first that will give me a .tab file get_homologues.pl -d faa -c -M -n 25 as suggested in 4.8.4 faa directory contains all the amino acid sequences of my bacterial strains. The above script will generate faa_homologue folder where I should find the .tab file to used further.
I have two questions here:
Hi, can you please post this as a separate issue? I would need to know how many CPU cores you have, genomes you are analyzing and how many genes they have, Bruno
plot_matrix_heatmap.sh -i pangenome_matrix_t0.tab -o pdf -N -H 8 -W 14 -m 28 -v 28 -t "pangenome" -k "genes per cluster"
Plotting file pangenome_matrix_t0_heatmap.pdf
Error: There are less than four complete data rows. Pleae revise your input table! Execution halted
Please suggest what should I do to get an output