d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Updated analysis: Modules that use RMTL/PMTL file #219

Closed runjin326 closed 3 years ago

runjin326 commented 3 years ago

What analysis module should be updated and why?

Since the PR with changes from RMTL to PMTL is in, we now need to change the modules that directly call this file (or use the module long-format-table-utils): cnv-frequencies snv-frequencies fusion-frequencies rna-seq-expression-summary-stats

What changes need to be made? Please provide enough detail for another participant to make the update.

For all the above nodules, do the following

What input data should be used? Which data were used in the version being updated?

v9 /v10 Either re-run now and merge or re-run with changes after v10 is released.

When do you expect the revised analysis will be completed?

1-2 days.

Who will complete the updated analysis?

@ewafula

ewafula commented 3 years ago

@runjin326 can you merge the PMTL updates for the ensg-hugo-pmtl-mapping.tsv module and the long-format-table-utils module that will enable me to update the dependent modules that you have listed in the ticket?

runjin326 commented 3 years ago

@ewafula, it is already merged :) I mentioned in the ticket. You should be able to get it once you do a dev pull and merge.

sangeetashukla commented 3 years ago

@ewafula, it is already merged :) I mentioned in the ticket. You should be able to get it once you do a dev pull and merge.

@runjin326 The ensg-hugo-pmtl-mapping.tsv is part of the other PR that is pending final review and merge. When @afarrel or @chinwallaa have reviewed it, I will reach out to you to merge it.

ewafula commented 3 years ago

@runjin326, @logstar, I am encountering assertion errors when running rna-seq-expression-summary-stats module with the updated ensg-hugo-pmtl-mapping.tsv mapping file. There is likely a conflict with gene symbols presence between the new PMTL file and expression matrix. What would you recommend we do to address this issue?

Error in get_output_ss_df(x, gsb_gid_df) : 
  all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted
Error in get_output_ss_df(x, gsb_gid_df) : 
 all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted

Here are new new gene symbols in ensg-hugo-pmtl-mapping.tsv table absent in ensg-hugo-rmtl-mapping.tsv table:

ensg_id gene_symbol     pmtl    version
ENSG00000198793 mTOR    Relevant Molecular Target       PMTL version 1.1
ENSG00000007312 CD79b   Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        GD2 (Disialoganglioside)        Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        DNA (alkylators)        Relevant Molecular Target       PMTL version 1.1

@sangeetashukla, you could delete the first two(mTOR and CD79b) because there are just variants of MTOR and CD79B already in both tables. Not sure about the last two (GD2 and DNA) which are more than symbols and don't have corresponding ENSG IDs

Cc @afarrel , @chinwallaa

sangeetashukla commented 3 years ago

@runjin326, @logstar, I am encountering assertion errors when running rna-seq-expression-summary-stats module with the updated ensg-hugo-pmtl-mapping.tsv mapping file. There is likely a conflict with gene symbols presence between the new PMTL file and expression matrix. What would you recommend we do to address this issue?

Error in get_output_ss_df(x, gsb_gid_df) : 
  all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted
Error in get_output_ss_df(x, gsb_gid_df) : 
 all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted

Here are new new gene symbols in ensg-hugo-pmtl-mapping.tsv table absent in ensg-hugo-rmtl-mapping.tsv table:

ensg_id gene_symbol     pmtl    version
ENSG00000198793 mTOR    Relevant Molecular Target       PMTL version 1.1
ENSG00000007312 CD79b   Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        GD2 (Disialoganglioside)        Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        DNA (alkylators)        Relevant Molecular Target       PMTL version 1.1

@sangeetashukla, you could delete the first two(mTOR and CD79b) because there are just variants of MTOR and CD79B already in both tables. Not sure about the last two (GD2 and DNA) which are more than symbols and don't have corresponding ENSG IDs

Cc @afarrel , @chinwallaa

@ewafula My previous PR is now closed and since I encountered errors reading some files directly from S3 bucket, @runjin326 created this PR, and when that is merged you can use a new ensg-hugo-pmtl-mapping.tsv. cc @afarrel @chinwallaa

ewafula commented 3 years ago

@runjin326, @logstar, I am encountering assertion errors when running rna-seq-expression-summary-stats module with the updated ensg-hugo-pmtl-mapping.tsv mapping file. There is likely a conflict with gene symbols presence between the new PMTL file and expression matrix. What would you recommend we do to address this issue?

Error in get_output_ss_df(x, gsb_gid_df) : 
  all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted
Error in get_output_ss_df(x, gsb_gid_df) : 
 all(rownames(ss_df) %in% rownames(gsb_gids_conv_df)) is not TRUE
Calls: get_expression_summary_stats_out_dfs -> lapply -> FUN -> get_output_ss_df -> stopifnot
Execution halted

Here are new new gene symbols in ensg-hugo-pmtl-mapping.tsv table absent in ensg-hugo-rmtl-mapping.tsv table:

ensg_id gene_symbol     pmtl    version
ENSG00000198793 mTOR    Relevant Molecular Target       PMTL version 1.1
ENSG00000007312 CD79b   Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        GD2 (Disialoganglioside)        Relevant Molecular Target       PMTL version 1.1
Symbol_Not_Found        DNA (alkylators)        Relevant Molecular Target       PMTL version 1.1

@sangeetashukla, you could delete the first two(mTOR and CD79b) because there are just variants of MTOR and CD79B already in both tables. Not sure about the last two (GD2 and DNA) which are more than symbols and don't have corresponding ENSG IDs Cc @afarrel , @chinwallaa

@ewafula My previous PR is now closed and since I encountered errors reading some files directly from S3 bucket, @runjin326 created this PR, and when that is merged you can use a new ensg-hugo-pmtl-mapping.tsv. cc @afarrel @chinwallaa

@sangeetashukla, thank you!

runjin326 commented 3 years ago

Closing with PR128, PR130, PR133, PR137 merged.