SamGa3 / microbiome_reconstruction

GNU General Public License v3.0
14 stars 2 forks source link

Problem #2

Closed Susan0929 closed 1 month ago

Susan0929 commented 1 year ago

During the execution of the second step, we encountered a problem: "A USER ERROR has occurred: Failed to read bam header from data/RNAseq/input_bam/example1/ERR2756905.bam Caused by:data/RNAseq/input_bam/example1/ERR2756905.bam". We attempted the procedure on two different computers, and both produced the same outcome. Kindly provide insight on the potential cause of the problem.

1696332954709
SamGa3 commented 1 year ago

Hi Susan0929, I checked the tutorial and this error doesn't occur for me. Anyway, this is a common error that occurs when you input the wrong path to PathSeq (I've reproduced your error by writing the wrong path -no example1 folder- as input to PathSeq, check the picture). I also noticed from your picture that you use a different path. Let me know if this is the problem. Gaia Screenshot from 2023-10-06 15-26-57

Susan0929 commented 1 year ago

Dear Gaia: Thank you very much for your response and your reminder. The problem, as you mentioned, has been resolved. However, during further operation, we encountered another problem: "unamb_score_norm_tab="../../data/RNAseq/bacteria/raw/merged_unamb_score_norm/example/example_bacteria_species_merged_unamb_score_norm.txt".Do we need to generate this file ourselves? Is it merged using merge.dmp? But when I open merge.dmp, I only see numbers. Could you provide more detailed instructions on how to generate this file? There is also an error occurring in this step "../R-3.6.1/bin/Rscript scripts/microbes_values/microbiome_estimation_commands.R". Is the problem similar? I look forward to your reply. Thank you very much.
Best Regards, Susan

王姝洁

@.*** |

---- Replied Message ---- | From | @.> | | Date | 10/6/2023 22:17 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Hi Susan0929, I checked the tutorial and this error doesn't occur for me. Anyway, this is a common error that occurs when you input the wrong path to PathSeq (I've reproduced your error by writing the wrong path -no example1 folder- as input to PathSeq, check the picture). I also noticed from your picture that you use a different path. Let me know if this is the problem. Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 1 year ago

Hi Susan,

In step 4, you create the unamb_score_norm_tab from the unambiguous and score tables, which are built from the PathSeq output (in step 3). If you are analyzing TCGA cancer types mentioned in my paper, these tables are already provided. However, if you are working with your own data, you will need to create them by modifying step 3. The merge.dmp file is supplied by PathSeq and lists the taxa that need to be merged. The script utilizes this file, so there's no need for you to open it. I've already executed ../R-3.6.1/bin/Rscript scripts/microbes_values/microbiome_estimation_commands.R but I didn't encounter any errors during the execution. Could you please provide more specific details about the problem you're facing, or share any error logs if available?

Gaia

Susan0929 commented 1 year ago
Dear Gaia: Thank you very much for your patience. I am currently replicating your results, and under your guidance, we have reached step 7. However, there seems to be a possible error in the R programming code. Could you please help us identify the potential issue? If we can successfully replicate your results, we would be happy to provide an additional Chinese guide as a supplement to expand its influence. We look forward to your response. Best Regards, Susan 王姝洁

@. | ---- Replied Message ---- | From | @.> | | Date | 10/11/2023 17:56 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Hi Susan,

In step 4, you create the unamb_score_norm_tab from the unambiguous and score tables, which are built from the PathSeq output (in step 3). If you are analyzing TCGA cancer types mentioned in my paper, these tables are already provided. However, if you are working with your own data, you will need to create them by modifying step 3. The merge.dmp file is supplied by PathSeq and lists the taxa that need to be merged. The script utilizes this file, so there's no need for you to open it. I've already executed ../R-3.6.1/bin/Rscript scripts/microbes_values/microbiome_estimation_commands.R but I didn't encounter any errors during the execution. Could you please provide more specific details about the problem you're facing, or share any error logs if available?

Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 1 year ago

Hi Susan,

thank you for reporting the issue. I have now fixed the problem (a repeated chunk name), allowing you to replicate the results. Please pull my GitHub repository again and rerun step 7. My supervisor is currently out of the office, but I will discuss your suggestion with him. Feel free to reach out if you need any further help.

Gaia

Susan0929 commented 1 year ago

Dear Gaia: Thank you very much for your response. After you corrected the issue in step 7, we continued running and encountered some small problems. Here is a summary: 1. In step 8, from line 481 to 532 in the survival_analysis_commands.R code, replace ". /. /" with "". 2. In step 10, at line 158 in the identification_related_species_wilc_commands.R file, the word "species" was misspelled as "speciese". 3. In step 12, in this code snippet: "humann_split_stratified_table --input data/RNAseq/humann_output/right/COAD_selectedTumor_right_pathabundance.tsv --output data/RNAseq/humann_output/right/", there is no file named "COAD_selectedTumor_right_pathabundance.tsv". After I manually renamed the file in that directory to this name, the codes from 92 to 115 and 121 to 136 resulted in errors:

Quitting from lines 32-78 (sample_bootstrapping.Rmd) Error in feature_binning_decision(metadata = full_metadata, columns_to_be_tested = params$cont_properties[[x]], : could not find function "feature_binning_decision"

It says that the function "feature_binning_decision" cannot be found. 4. Step 14 also failed. The tool gdc-client was used to download the data from the .tsv table and then decompress it in batch. However, strangely, there is no file named "fpkm_manifest.tsv" in the directory. I manually renamed the file that may be it from "gdc_manifest_all_htseq_fpkm_files_2021_03_08.txt" to "fpkm_manifest.tsv", but this attempt also failed. Look forward to your reply. Thank you very much. Best Regards, Susan

Susan0929 commented 1 year ago

Dear Gaia:

With your help, we've done almost all of the work in README. But there are also two questions ready to solve.

Q1: After you change the function from feature_binning_decision to property_binning_decision, the bug of "could not find function 'feature_binning_decision'" has been solved. However, a new error has emerged, stating "attempt to set 'colnames' on an object with less than two dimensions."

Q2: We attempt to execute the R code in Step 14, but the process is killed by Ubuntu:

In light of this, we are seeking your guidance on the necessary memory and CPU resources required to successfully run this code.

Look forward to your reply. Thank you very much.

Best Regards,

Susan

Quitting from lines 32-78 (sample_bootstrapping.Rmd)Error in colnames<-(tmp, value = "mutation_burden_level") : attempt to set 'colnames' on an object with less than two dimensionsCalls: ... eval_with_user_handlers -> eval -> eval -> colnames<-

Execution halted"

Q2: We attempt to execute the R code in Step 14, but the process is killed by Ubuntu:

In light of this, we are seeking your guidance on the necessary memory and CPU resources required to successfully run this code.

Look forward to your reply. Thank you very much.

Best Regards,

Susan

王姝洁

@. | ---- Replied Message ---- | From | @.> | | Date | 10/19/2023 01:01 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Hi Susan,

thank you for reporting the issue. I have now fixed the problem (a repeated chunk name), allowing you to replicate the results. Please pull my GitHub repository again and rerun step 7. My supervisor is currently out of the office, but I will discuss your suggestion with him. Feel free to reach out if you need any further help.

Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 1 year ago

Hi Susan,

I apologize for the delayed answer, personal and health issues have recently hindered my availability. Thank you for your patience. I also thank you for highlighting the spelling and the path errors, which are now corrected. The other 2 questions you highlighted in your last comment have been solved now, I have uploaded the most updated version of the code, with the correct label of the functions and the missing lines. I've modified the last script, step 14, so that it runs on each cancer type separately, so now it can run in a machine of 6 cpus and 15G of memory.

Let me know if I can be helpful, Gaia

Susan0929 commented 1 year ago

Dear Gaia, Thank you for your reponse and patience. I have another question about the article that I would like to ask you.When conducting tumor-specific microbiome analysis, why is esophageal cancer not considered separately, but classified as head and neck tumors? Is it because of too few cases, or some other reason? Could you please share the reasons for this convenience?

Look forward to your reply. Thank you very much.

Best Regards,

Susan

At 2023-11-15 00:23:11, "SamGa3" @.***> wrote:

Hi Susan,

I apologize for the delayed answer, personal and health issues have recently hindered my availability. Thank you for your patience. I also thank you for highlighting the spelling and the path errors, which are now corrected. The other 2 questions you highlighted in your last comment have been solved now, I have uploaded the most updated version of the code, with the correct label of the functions and the missing lines. I've modified the last script, step 14, so that it runs on each cancer type separately, so now it can run in a machine of 6 cpus and 15G of memory.

Let me know if I can be helpful, Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 12 months ago

Hi Susan,

I considered TCGA-provided cancer types (https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations). Please note that Esophageal carcinoma is abbreviated as ESCA in TCGA, consisting of approximately 184 patients, and is distinct from HNSC (Head and Neck squamous cell carcinoma). I focused my analysis solely on HNSC samples and did not include ESCA in my study.

Best, Gaia

Susan0929 commented 12 months ago

Dear Gaia,

I sincerely appreciate your prompt response and patience in addressing the problems encountered during Step 12, Bootstrapping, and the optimization process in Step 14. However, it appears that there might be issues like incorrect paths or missing files in the R code for Step 14. Have you encountered this problem during the debugging process?

Quitting from lines 28-62 (from_FPKM_to_TPM.Rmd) Error in fread(paste(params$output_fpkm, "/", x, "_fpkm.txt", sep = ""), : File '../../data/RNAseq/FPKM//TCGA-COAD_fpkm.txt' does not exist or is non-readable. getwd()=='/microbiome_reconstruction/scripts/gene_expression'

Thank you for your assistance. Look forward to your reply.

Thank you very much.

Best Regards, Susan

王姝洁

@. | ---- Replied Message ---- | From | @.> | | Date | 11/15/2023 00:23 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Hi Susan,

I apologize for the delayed answer, personal and health issues have recently hindered my availability. Thank you for your patience. I also thank you for highlighting the spelling and the path errors, which are now corrected. The other 2 questions you highlighted in your last comment have been solved now, I have uploaded the most updated version of the code, with the correct label of the functions and the missing lines. I've modified the last script, step 14, so that it runs on each cancer type separately, so now it can run in a machine of 6 cpus and 15G of memory.

Let me know if I can be helpful, Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 11 months ago

Dear Susan,

I updated the workflow to automatically download all the FPKM data from GDC and convert them to TPM. I think it's important to note that this step is not mandatory for the workflow, it creates gene expression tables required for input into CIBERSORTx. The output files from CIBERSORTx are already available in the microbiomereconstruction/metadata/*/immuneInfiltration_metadata.txt directory. I appreciate your effort in identifying errors so that I can improve the code.

Best, Gaia

Susan0929 commented 11 months ago

Dear Gaia,

I sincerely appreciate your prompt response and your patience in addressing the issues encountered during Step 14. The R code in Step 14 appears to be error-free; however, it seems there might be issues when executing the script TPM_conversion_commands.R. While the script successfully generates TCGA-COAD_fpkm.txt and TCGA-GBM_fpkm.txt, it encounters a failure in generating TCGA-LUAD_fpkm.txt. Have you come across this problem during the debugging process?

Quitting from lines 28-62 (from_FPKM_to_TPM.Rmd) Error in fread(paste(params$output_fpkm, "/", x, "_fpkm.txt", sep = ""), : File '../../data/RNAseq/FPKM//TCGA-LUAD_fpkm.txt' does not exist or is non-readable. getwd()=='/microbiome_reconstruction/scripts/gene_expression'

Thank you for your assistance. Look forward to your reply.

Thank you very much.

Best Regards,

Susan

王姝洁

@. | ---- Replied Message ---- | From | @.> | | Date | 11/25/2023 01:02 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Dear Susan,

I updated the workflow to automatically download all the FPKM data from GDC and convert them to TPM. I think it's important to note that this step is not mandatory for the workflow, it creates gene expression tables required for input into CIBERSORTx. The output files from CIBERSORTx are already available in the microbiome_reconstruction/metadata/*/_immuneInfiltration_metadata.txt directory. I appreciate your effort in identifying errors so that I can improve the code.

Best, Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 11 months ago

Dear Susan,

I have not encountered that error. From your error message, I see there is a wrong path: "../../data/RNAseq/FPKM//TCGA-LUAD_fpkm.txt" instead of "../../data/RNAseq/FPKM/tcga/TCGA-LUAD_fpkm.txt". I believe this might be the issue; could you please verify it? The command reported in the file microbiome_reconstruction/scripts/gene_expression/TPM_conversion_commands.R is correct:

LUAD

rmarkdown::render("scripts/gene_expression/from_FPKM_to_TPM.Rmd", params=list( manifest = "../../data/RNAseq/FPKM/tcga/LUAD_fpkm_manifest.tsv", dir = "../../data/RNAseq/FPKM/tcga/LUAD/", converter_tab = "../../data/RNAseq/FPKM/tcga/gene_annotation_v22_gene_length.txt", output_fpkm = c("../../data/RNAseq/FPKM/tcga/"), output_tpm = c("../../data/RNAseq/TPM/tcga/") ), output_file = "../../data/RNAseq/TPM/tpm.html" )

Moreover, please check that you are running the script from the folder "/microbiome_reconstruction", otherwise the relative paths wouldn't work and could result in an error similar to the one you posted.

Let me know if you manage to fix the problem or if I should investigate further.

Best, Gaia

Susan0929 commented 11 months ago

Dear Gaia,

I have reproduced all the steps outlined in README.md, but in some html files we can't see any result(such as tpm.html generated in step14.

Is that normal? Looking forward to your reply.

Thanks again.

Best Regards, Susan

王姝洁

@. | ---- Replied Message ---- | From | @.> | | Date | 11/30/2023 19:02 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [SamGa3/microbiome_reconstruction] Problem (Issue #2) |

Dear Susan,

I have not encountered that error. From your error message, I see there is a wrong path: "../../data/RNAseq/FPKM//TCGA-LUAD_fpkm.txt" instead of "../../data/RNAseq/FPKM/tcga/TCGA-LUAD_fpkm.txt". I believe this might be the issue; could you please verify it? The command reported in the file microbiome_reconstruction/scripts/gene_expression/TPM_conversion_commands.R is correct:

LUAD

rmarkdown::render("scripts/gene_expression/from_FPKM_to_TPM.Rmd", params=list( manifest = "../../data/RNAseq/FPKM/tcga/LUAD_fpkm_manifest.tsv", dir = "../../data/RNAseq/FPKM/tcga/LUAD/", converter_tab = "../../data/RNAseq/FPKM/tcga/gene_annotation_v22_gene_length.txt", output_fpkm = c("../../data/RNAseq/FPKM/tcga/"), output_tpm = c("../../data/RNAseq/TPM/tcga/") ), output_file = "../../data/RNAseq/TPM/tpm.html" )

Moreover, please check that you are running the script from the folder "/microbiome_reconstruction", otherwise the relative paths wouldn't work and could result in an error similar to the one you posted.

Let me know if you manage to fix the problem or if I should investigate further.

Best, Gaia

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

SamGa3 commented 10 months ago

Hi Susan,

Yes, it is normal. I am happy that you completed the workflow successfully.

Best, Gaia