Closed RNieuwenhuis closed 6 months ago
Hi Ronald,
Thanks for using PyBSASeq.
In panel a, the blue curve indicates total SNPs while the black curve indicates the significant SNPs. If a genomic region contains a gene that participate in controlling the trait, we will see SNP enrichment (more sSNPs) in this region. Otherwise no SNP enrichment will be observed, no matter how many total SNPs this region has.
The red lines in panels b, c, and d indicate confidence intervals/threshold at the genomic region level while the magenta curves in panels c and d indicate confidence intervals/thresholds at the SNP levels. The sSNP/totalSNP ratio is a parameter at the genomic region level, so its thresholds cannot be estimated at the SNP level. It is proper to estimate confidence intervals/thresholds at the genomic region level, because we try to identify genomic regions, not a few SNP, that are associated the trait. But please ignore the red lines in panel c and d for the time being, it is not published yet. You can find the details in my new manuscript on BioRXiv (https://www.biorxiv.org/content/10.1101/2023.03.12.532308v2). The difference of panel b between your attached picture and my previous publication is that different p-values are used: alpha=0.01 was used to identify sSNPs in the real dataset while alpha=0.1 was used for the same purpose in the simulated dataset for increasing threshold in my previous publications. But alpha=0.01 was used for both the real and simulated datasets in your case. You can change the alpha value when run the script.
A peak above or beyond the confidence intervals/thresholds represents a gene that participates in controlling the phenotype. Genes contributing more to the phenotype will have higher peaks while genes contributing less to the phenotype will have lower peaks. Any phenotype is the results of interactions among many genes, so it is OK that tens/hundreds peaks are identified in a BSA-Seq study.
Please let me know if you have any question.
Best,
Jianbo
Dear,
I ran PyBSASeq using default settings and was wondering about the output, especially the plots. Please find attached the png file.
I have several questions:
Furthermore, based on the different methods shown in the plot, the significant SNP method yield many more and also quite broad peak compared to the G-statistic. The delta AF method results in many more QTL as well. What would you advise be in this case? Focus on chr3 and ch6 or should all the other peaks also have the same priority?
Looking forward to your expert advise.
Kind regards,
Ronald