bodkan / admixr

An R package for reproducible and automated ADMIXTOOLS analyses
https://bodkan.net/admixr
Other
28 stars 9 forks source link

admixR and non-standard Chromosome names #73

Closed Marvin02860 closed 4 years ago

Marvin02860 commented 4 years ago

Hello,

I would like to use admixR to perform some introgression tests on my data, as the interface is much easier to use than writing scripts for AdmixTools. I am working with non-human data, resulting from a targeted re-sequencing experiment. My SNPs are located on more than 500 contigs, with non-standard names, and this is a problem with Admixtools, as I get the following error when running admixR:

fatalx: bad chrom: ctg7180000255172

I tried to replace my #CHR names by numerical values from 1 to 550, but as soon as values are above 100 it does not work anymore.

I contacted a person responsible for AdmixTools scripts regarding this issue, and I got a useful reply: see https://github.com/DReichLab/AdmixTools/issues/54

But the problem is that the option proposed to face my issue of chromosome names with AdmixTools does not seem to be implemented in admixR. Would you think of any way to deal with that problem using admixR?

Please let me know if anything is unclear.

Any help will be greatly appreciated :)

All the best, Marvin

bodkan commented 4 years ago

Hi Marvin,

Thank you for bringing this issue to my attention!

I'm not too familiar with parameters that are necessary for this kind of non-human data, but could you test if doing something like this:

result <- d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps,
                   params = list(blockname = "my_contigs.txt"))

does what you need (see also this for a related issue)? It's not very pretty and could certainly be done in a nicer way but I'm currently too busy writing up my thesis to do coding. This is how optional parameters for ADMIXTOOLS are currently specified.

Let me know if this way of specifying the "blockname" parameter works.

Marvin02860 commented 4 years ago

Thank you for your quick reply, and sorry for interrupting your writing! :)

I tried what you suggested. The result is that the test is not being conducted I think, although there is no error message. But all the values, such as D, or number of ABBA / BABA sites, Zscore and nsnps equal 0.

All the best, Marvin

bodkan commented 4 years ago

Hmm, this is strange.

One thing that would be extremely helpful to start with is to get a minimum reproducible example running. In this case it would be trying to see if you can get the blockname solution suggested by Nick in the other Github repo running using just ADMIXTOOLS on the command-line, without admixr being involved.

Do you know what I mean? Just adding that blockname option to the par file and seeing if you get reasonable result without admixr.

If you get the analysis working using ADMIXTOOLS alone, then it will be really easy to figure out how to do this from admixr's site. Without it it's hard to say if it's improperly formatted configuration of those blocks (so your analysis wouldn't work in ADMIXTOOLS either) or if it's my package that's misbehaving.

Let me know how it goes!

Marvin02860 commented 4 years ago

Hi again,

In the end, what you suggested is working with admixr! Thank you!

The problem was from the format of the "my_contigs.txt" file. It needs to contain only integers. So, I rename my contigs to numbers from 1 to 550.

Great solution, thank you again, and sorry for the confusion.

All the best, Marvin

bodkan commented 4 years ago

No worries! I'm glad I could help and that the solution was already in the package. :)

Could you perhaps write also to the original issues thread on the ADMIXTOOLS repo (the one you linked above) so that people who run into a similar problem with admixr can find the solution here? It would be great if everyone sees that this is actually already implemented and they don't need to wait for this feature to be added... Thanks!

I'll do my best to clarify in the tutorial how to specify various optional parameters in admixr. Right now this is not obvious at all. :/

I will also make a separate note about this too-many-contigs issue for people working with this sort of data. You're definitely not the first person who works with non-human species and encountered a problem related to many contigs in the data (although this one was unknown to me)...

Thanks for bringing this to my attention!

Martin