ekirving / mesoneo_paper

Source code from the paper: "The Selection Landscape and Genetic Legacy of Ancient Eurasians" (2024) Nature
https://doi.org/10.1038/s41586-023-06705-1
MIT License
3 stars 0 forks source link

About the input files #1

Closed JielinLi closed 7 months ago

JielinLi commented 8 months ago

Hello, I want to ask what are the input files of 'imputed', 'andres', 'inv17_h1h2', 'mathieson' in config? Thank you!

ekirving commented 8 months ago

These are tabular text files that contain SNPs of interest; i.e., they are used for specifying a list of SNPs to run the selection pipeline on.

For example, "mathieson" points to a file called 41586_2015_BFnature16152_MOESM270_ESM.txt which is the 'Supplementary Data 3' table from Mathieson et al. (2015) https://www.nature.com/articles/nature16152#Sec14

JielinLi commented 8 months ago

I understand, thank you very much. I also wanted to ask, Why is 'chr3_true_paths' listed separately in the input file?

ekirving commented 7 months ago

We made two alternative versions of the chr3 simulations, one in which the ancestral paths were inferred by the machine learning classifier (i.e., chr3_inferred_paths) and another in which the genotypes were labeled with the true ancestral paths from the simulation (i.e., chr3_true_paths). This allowed us to isolate the effects of path mislabelling on the selection test, and to confirm that there was no major bias.

JielinLi commented 7 months ago

Thank you very much for your patient response! Does it mean that I can choose not to include this file when I'm working? I also wanted to ask, what is the difference between 'ancestral_paths_new' and 'ancestral_paths_v3'? They seem to have the same metadata.

ekirving commented 7 months ago

Yes, you can choose to omit any/all of these input files. All you need are the input files necessary to make the specific outputs you are requesting from the snakemake scheduler.

The difference between ancestral_paths_new and ancestral_paths_v3 is that the ancestral path local ancestry model was updated during review of the paper and a new version (v3) was created.

JielinLi commented 7 months ago

I understand, thank you so much!

JielinLi commented 7 months ago

Hello, when using RELATE, I download 'humanancestor{chr}.fa,' from 'http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/ancestral_alignments/'. I want to know that if the files are the same as '{chr}.humanc_e71.fa' in your 'relate.smk'? Thank you!

ekirving commented 7 months ago

No, they are not the same version of the human ancestral sequence. The version you linked to is from Ensembl build e59 and the version I used is e71. In practice, this will probably not have a major impact, as there are unlikely to have been major changes in the inference of the ancestral alleles between these two builds.

JielinLi commented 7 months ago

Thank you so much!