eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
110 stars 26 forks source link

Error with plot_matrix_heatmap.sh #47

Closed ga23981 closed 5 years ago

ga23981 commented 5 years ago

plot_matrix_heatmap.sh -i pangenome_matrix_t0.tab -o pdf -N -H 8 -W 14 -m 28 -v 28 -t "pangenome" -k "genes per cluster"

Plotting file pangenome_matrix_t0_heatmap.pdf

Error: There are less than four complete data rows. Pleae revise your input table! Execution halted

ERROR: file pangenome_matrix_t0_heatmap.pdf was NOT produced.

You can try option -C or alternatively remove columns in the matrix.

ERROR: file pangenome_matrix_t0_BioNJ.ph was NOT produced!

ERROR: file ANDg_meand_silhouette_width_statistic_plot.pdf was NOT produced!

ERROR: file was NOT produced!

Please suggest what should I do to get an output

ga23981 commented 5 years ago

Using -C option also resulted in same error.

brunocontrerasmoreira commented 5 years ago

Hi, can you please your pangenome_matrix_t0.tab file? Thanks

ga23981 commented 5 years ago

Thank you, Please find attached my matrix file.

Gaurav

On Thu, Sep 12, 2019 at 4:36 AM brunocontrerasmoreira < notifications@github.com> wrote:

Hi, can you please your pangenome_matrix_t0.tab file? Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6INIVZWVMSWS4P7U5ELQJH5QRA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RDUCI#issuecomment-530725385, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKQNKC3HTVMX4QYC53QJH5QRANCNFSM4IVYFY3A .

brunocontrerasmoreira commented 5 years ago

Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?

ga23981 commented 5 years ago

Please share your personal e-mail address.

Thank you Gaurav

On Thu, Sep 12, 2019 at 11:24 AM brunocontrerasmoreira < notifications@github.com> wrote:

Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IPH2ANWTJY265J2EZ3QJJNLXA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SIYJA#issuecomment-530877476, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKV3EPYGO4M7APLQDTQJJNLXANCNFSM4IVYFY3A .

brunocontrerasmoreira commented 5 years ago

Check it out at https://aem.asm.org/content/79/24/7696.long

ga23981 commented 5 years ago

https://drive.google.com/open?id=1oRH9oGT7saiDiZdh-Dr81ddlGC7Nj0uh

Above is the link to the file. Please have a look.

Thank you Gaurav

On Thu, Sep 12, 2019 at 11:30 AM Gaurav Agarwal gaurav.iari@gmail.com wrote:

Please share your personal e-mail address.

Thank you Gaurav

On Thu, Sep 12, 2019 at 11:24 AM brunocontrerasmoreira < notifications@github.com> wrote:

Sorry, I did not get the attach, can you send it to my personal email or paste a URL here?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IPH2ANWTJY265J2EZ3QJJNLXA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SIYJA#issuecomment-530877476, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IKV3EPYGO4M7APLQDTQJJNLXANCNFSM4IVYFY3A .

eead-csic-compbio commented 5 years ago

Thanks, I had a look to your file and the script and they seem just fine. The problem is your feeding the wrong matrix to the script, as that script takes square matrices. I quote from the manual:

"plot_matrix_heatmap.sh calculates ordered heatmaps with attached row and column dendrograms from squared tab-separated numeric matrices, which can be presence/absence pangenomic matrices or similarity / identity matrices as those produced by get_homologues with flag -A. From the latter type of matrix a distance matrix can optionally be calculated to drive a neighbor joining tree. See example on section 4.8.1."

Perhaps what you really want is another script that will take you tab matrix just fine:

"hcluster_pangenome_matrix.sh generates a distance matrix out of a tab-separated presence/absence pangenome matrix, which is then used to call R functions hclust() and heatmap.2() in order to produce a heatmap. "

Good luck, Bruno

ga23981 commented 5 years ago

Hello Bruno,

Thank you for your response.

I tried what you suggested and got some plots of heat maps and phylogenetic tree. These are individual plots but not what is shown in Fig.12 in the manual. I used the hcluster_pangenome_matrix.sh with pangenome_matrix_t0.tab file. I wanted to plot a combined tree and heat map as shown in fig 12 which I believe is based on the core genome and accessory genome containing presence and absence variants. Correct me if I am wrong.

For my understanding could you please help me with what is mentioned in Fig. 12 in the manual.

A complementary view of the same data con be obtained with script plot_matrix_heatmap.sh, which was called to produce Figure 12 http://eead-csic-compbio.github.io/get_homologues/manual/#fig:panheatmap

./plot_matrix_heatmap.sh -i sample_intersection/pangenome_matrix_t0.tab -o pdf \

-r -H 8 -W 14 -m 28 -t "sample pangenome (clusters=180)" -k "genes per cluster"

Heatmap of the previous pangenome matrix, with dendrograms sorting genomes according to cluster occupancy.

Thank you Gaurav

On Fri, Sep 13, 2019 at 7:09 AM eead-csic-compbio notifications@github.com wrote:

Thanks, I had a look to your file and the script and they seem just fine. The problem is your feeding the wrong matrix to the script, as that script takes square matrices. I quote from the manual:

"plot_matrix_heatmap.sh calculates ordered heatmaps with attached row and column dendrograms from squared tab-separated numeric matrices, which can be presence/absence pangenomic matrices or similarity / identity matrices as those produced by get_homologues with flag -A. From the latter type of matrix a distance matrix can optionally be calculated to drive a neighbor joining tree. See example on section 4.8.1."

Perhaps what you really want is another script that will take you tab matrix just fine:

"hcluster_pangenome_matrix.sh generates a distance matrix out of a tab-separated presence/absence pangenome matrix, which is then used to call R functions hclust() and heatmap.2() in order to produce a heatmap. "

Good luck, Bruno

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IOADV4JO4GRYABG7GTQJNYE5A5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UWU4I#issuecomment-531196529, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IIKZPXP6MZNGR346O3QJNYE5ANCNFSM4IVYFY3A .

eead-csic-compbio commented 5 years ago

Hello Gaurav, thanks for your detailed explanation. I have now tried to replicate the command to create fig12 in the manual with your pangenome file. What I found is an issue that Felipe Lira had raised some time ago and that we had not taken care of yet. It is the fact that pangenome files .tab created with compare_clusters.pl contain a trailing tab (\t) at the end of each line. I will correct that last week. In the meantime you can do this

$ cut -f 1-14453 pangenome_matrix_t0.tab > pangenome_matrix_t0.fix.tab

and use the new file to call the script as explained in the manual. As your matrix has 14K columns the resulting figure is not particularly nice. I would suggest to select a subset of columns, perhaps removing all redundant columns,
Bruno

ga23981 commented 5 years ago

Many thanks, Bruno. Just one clarification requested. When you say redundant columns do you mean columns with same gene ids header?

Gaurav

On Fri, 13 Sep 2019, 5:07 pm eead-csic-compbio, notifications@github.com wrote:

Hello Gaurav, thanks for your detailed explanation. I have now tried to replicate the command to create fig12 in the manual with your pangenome file. What I found is an issue that Felipe Lira had raised some time ago and that we had not taken care of yet. It is the fact that pangenome files .tab created with compare_clusters.pl contain a trailing tab (\t) at the end of each line. I will correct that last week. In the meantime you can do this

$ cut -f 1-14453 pangenome_matrix_t0.tab > pangenome_matrix_t0.fix.tab

and use the new file to call the script as explained in the manual. As your matrix has 14K columns the resulting figure is not particularly nice. I would suggest to select a subset of columns, perhaps removing all redundant columns, Bruno

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/get_homologues/issues/47?email_source=notifications&email_token=AMOL6IMNLOHO6NMXL7BOOYTQJP6ITA5CNFSM4IVYFY3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6WGG3I#issuecomment-531391341, or mute the thread https://github.com/notifications/unsubscribe-auth/AMOL6IOTGZ5RNRQXICH6TN3QJP6ITANCNFSM4IVYFY3A .

eead-csic-compbio commented 5 years ago

Good morning, I was referring to columns with identical patterns of gene absence. That will make your matrix narrower, and thus the heatmap smaller, but I understand you might not want that, Bruno

brunocontrerasmoreira commented 5 years ago

I have updated compare_clusters.pl, you can update it with git pull. Let me know if you you have any trouble, Bruno

ga23981 commented 5 years ago

Thank you, Bruno!

Could you please help me with this one now: I need to plot core-genome and pan-genome plots as shown in Fig. 16 of the manual. In order to get these plots I am executing the following script first that will give me a .tab file get_homologues.pl -d faa -c -M -n 25 as suggested in 4.8.4 faa directory contains all the amino acid sequences of my bacterial strains. The above script will generate faa_homologue folder where I should find the .tab file to used further.

I have two questions here:

  1. The above script has been running for last four days and hasn't ended. Does it take this much time? I have assigned 40 Gb of memory and 200 CPU hrs and 25 threads.
  2. Do I have to use the .tab file produced by the above script only as input in the following way: plot_pancore_matrix.pl -i xyz.tab -f core_Tettelin xyz.tab file is the file that is expected to be generated by get_homologues.pl as shown above (first). or Can I use any other already generated pangenome.tab files by earlier used scripts? Thank you Gaurav
eead-csic-compbio commented 5 years ago

Hi, can you please post this as a separate issue? I would need to know how many CPU cores you have, genomes you are analyzing and how many genes they have, Bruno