campbio / musicatk

Mutational Signature Comprehensive Analysis Toolkit
12 stars 9 forks source link

musicatk::create_musica() from a data frame leads to only NN as variant alleles #52

Open ThomasGro opened 2 years ago

ThomasGro commented 2 years ago

Hi, I run musicatk::create_musica() from a dataframe. This results in a count table with only NN as variant alleles. I am not sure how I can get to the correct variant alleles in. It also leads to downstream errors in the signature detection step.

head(dbs.df) chr start end ref alt sample 1 1 104017 104018 CC TT ONCOLEAD_CELL_CAPAN1 2 1 149875 149876 GC CG ONCOLEAD_CELL_CAPAN1 3 1 232961 232962 TG CA ONCOLEAD_CELL_CAPAN1 4 1 362904 362905 TT GG ONCOLEAD_CELL_CAPAN1 g=select_genome("hg19") dbs_musica <- create_musica(x = dbs.df, genome = g) build_standard_table(dbs_musica, g, "DBS78", overwrite = TRUE) Building count table from DBS with DBS78 schema head(dbs_musica@count_tables$DBS78@annotation) motif mutation context AC>NN_CA AC>NN_CA AC>NN CA AC>NN_CG AC>NN_CG AC>NN CG AC>NN_CT AC>NN_CT AC>NN CT AC>NN_GA AC>NN_GA AC>NN GA AC>NN_GG AC>NN_GG AC>NN GG AC>NN_GT AC>NN_GT AC>NN GT

musica.result <- discover_signatures(musica = dbs_musica, table_name = "DBS78", num_signatures = 3, algorithm = "lda", nstart = 10, par_cores=8) Error in colSums(counts_table) : 'x' must be an array of at least two dimensions

achevali commented 2 years ago

Hi Thomas, It looks like your steps should work. In order to view the count table to see what might be going on, take a look at head(dbs_musica@count_tables$DBS78@count_table) If the count table data is not sensitive you can post it here to diagnose.

It's possible your chromosome chr column should have data of the form chr1 not 1. You can try modifying that and see if it solves your issue.

Please let me know if either of those are informative!

ThomasGro commented 2 years ago

Hi, Thank you for your swift reply.

Since I am getting the same output also when I start from the VCF file, maybe I misunderstand the format of the count table and annotation:

        motif mutation context

AC>NN_CA AC>NN_CA AC>NN CA

My interpretation is that the mutation is NN in this example and the 3´and 5´bases are C and A. That would mean that the variant allele is not defined? Or is the variant allele information in the ‘context’ field?

Thank you, Thomas Von: achevali @.> Gesendet: Friday, March 11, 2022 4:10 PM An: campbio/musicatk @.> Cc: Thomas Grombacher @.>; Author @.> Betreff: Re: [campbio/musicatk] musicatk::create_musica() from a data frame leads to only NN as variant alleles (Issue #52)

[WARNING - EXTERNAL EMAIL] Do not open links or attachments unless you recognize the sender of this email. If you are unsure please click the button "Report suspicious email"

Hi Thomas, It looks like your steps should work. In order to view the count table to see what might be going on, take a look at @.**@._table) If the count table data is not sensitive you can post it here to diagnose.

It's possible your chromosome chr column should have data of the form chr1 not 1. You can try modifying that and see if it solves your issue.

Please let me know if either of those are informative!

— Reply to this email directly, view it on GitHubhttps://github.com/campbio/musicatk/issues/52#issuecomment-1065201206, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENMOYIKIADGCO2TVJMCSHTU7NO4PANCNFSM5QPI463A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click merckgroup.com/disclaimerhttps://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html to access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak versions of this disclaimer.

Please find our Privacy Statement information by clicking here: merckgroup.com/privacy-statement-countrieshttps://www.merckgroup.com/en/privacy-statement/privacy-statement-countries.html

achevali commented 2 years ago

The DBS motifs are defined here: https://cancer.sanger.ac.uk/signatures/dbs/

But I would highly recommend viewing dbs_musica@count_tables$DBS78@count_table

As this will enumerate the exact motifs and counts in a human-readable format If you have many samples you may want to try dbs_musica@count_tables$DBS78@count_table[, 1:3] to view just a few samples' counts.

ThomasGro commented 2 years ago

@.**@._table[, 1:3] Error in @.**@._table[, 1:3] : subscript out of bounds

@.**@.[, 1:3] motif mutation context AC>NN_CA AC>NN_CA AC>NN CA AC>NN_CG AC>NN_CG AC>NN CG AC>NN_CT AC>NN_CT AC>NN CT

And the latter leading to my confusion. But I understand now that you use the annotation as provided by Cosimc which is according to your ‘motif’ column. The ‘mutation’ column in ‘annotation’ is still confusing, and maybe should be changed to:

        motif mutation context

AC>NN_CA AC>NN_CA AC>CA NN AC>NN_CG AC>NN_CG AC>CG NN AC>NN_CT AC>NN_CT AC>CT NN

Thank you. Thomas Von: achevali @.> Gesendet: Friday, March 11, 2022 5:28 PM An: campbio/musicatk @.> Cc: Thomas Grombacher @.>; Author @.> Betreff: Re: [campbio/musicatk] musicatk::create_musica() from a data frame leads to only NN as variant alleles (Issue #52)

[WARNING - EXTERNAL EMAIL] Do not open links or attachments unless you recognize the sender of this email. If you are unsure please click the button "Report suspicious email"

The DBS motifs are defined here: https://cancer.sanger.ac.uk/signatures/dbs/

But I would highly recommend viewing @.**@._table

As this will enumerate the exact motifs and counts in a human-readable format If you have many samples you may want to try @.**@._table[, 1:3] to view just a few samples' counts.

— Reply to this email directly, view it on GitHubhttps://github.com/campbio/musicatk/issues/52#issuecomment-1065275430, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENMOYJBS7DOG2WAYGZZGTTU7NYARANCNFSM5QPI463A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click merckgroup.com/disclaimerhttps://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html to access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak versions of this disclaimer.

Please find our Privacy Statement information by clicking here: merckgroup.com/privacy-statement-countrieshttps://www.merckgroup.com/en/privacy-statement/privacy-statement-countries.html