SamGa3 / microbiome_reconstruction

GNU General Public License v3.0
14 stars 2 forks source link

Why three BRCA_bacteria_species_score.txt? #6

Open Helios417 opened 2 months ago

Helios417 commented 2 months ago

Dear Gaia,

I hope this email finds you well.

I’m currently working with the BRCA microbiome data and have encountered a few issues that I hope you can help clarify.

File Quantity: I noticed that there are three files for BRCA_bacteria_species_score (i.e., BRCA_bacteria_species_score1.txt, BRCA_bacteria_species_score2.txt, and BRCA_bacteria_species_score3.txt), whereas other cancer types only have one score file. Could you please explain why BRCA has multiple score files? Do these files represent different analyses or different sample sets? Data Merging Issue: When attempting to merge BRCA_bacteria_species_score1.txt with BRCA_bacteria_species_unamb.txt, I encountered an error indicating that the row names do not match. Additionally, BRCA_bacteria_species_score2.txt and BRCA_bacteria_species_score3.txt also failed to merge with BRCA_bacteria_species_unamb.txt. I’ve tried several approaches, but none have been successful. Could you provide any insights into why this might be happening and suggest possible solutions? I would greatly appreciate your guidance on these matters. Thank you in advance for your assistance.

Best regards, Helios image image

SamGa3 commented 1 month ago

Hi Helios417,

The reason there are 3 tables for the BRCA dataset is that the dataset is too large (too many samples) to fit within GitHub's file size limits. To address this, I added a command in the make_structure.sh script to merge the 3 tables:

# Merge big tables
cat data/RNAseq/bacteria/raw/score/BRCA_bacteria_species_score1.txt data/RNAseq/bacteria/raw/score/BRCA_bacteria_species_score2.txt data/RNAseq/bacteria/raw/score/BRCA_bacteria_species_score3.txt > data/RNAseq/bacteria/raw/score/BRCA_bacteria_species_score.txt

When you try to merge the full dataset of unambiguous reads with the partial dataset of scores (BRCA_bacteria_species_score1.txt), the script throws an error because some samples are missing.

Merging the 3 BRCA score tables before applying the microbial_estimation.Rmd script is essential to avoid the errors.

Let me know if this resolves the issue!

Best, Gaia