EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 12 forks source link

arlequinWrite() output ARP formatting #37

Closed Gillous closed 4 years ago

Gillous commented 4 years ago

Hello Eric,

I've been using the arlequinWrite function, but it appear now that the output file contains broken string characters (see at the end). Previous release was working great. I am under the v2.4.910n using R v4.0.2 This issue is random, since generating several ARP files will not lead to the same issue on the same sample names... But still, there are issues in the output ARP files.

Also, I sometimes use a modified version of arlequinWrite since I am working on microsatellite for haploid... and this function is only working on microsatellite for ploidy>1... If there is a way to trick this, I'll be more than happy!

Hope you could trick and fix this for the community!

Hope the best for you,

kind regards,

Gilles

Examples of broken ARP output files

the two first letters in "SampleSize" is missing and carriage return before "SampleData="


SampleName="Guyana" mpleSize=3SampleData={

or

"Sa" is missing from "SampleSize" and "SampleData"


SampleName="Uganda247"

mpleSize=18 mpleData={5652 1 7 4 2 2 5 2 3 4 12 3 3 3 3 3 5653 1 7 4 2 2 5 2 3 4 12 3 3 3 3 3

or

1st sample name is missing


SampleName="Ivory Coast221" SampleSize=14 SampleData={ 1 7 4 2 2 5 2 3 4 9 3 3 3 3 3 1803 1 7 4 2 2 5 2 3 4 9 3 3 3 3 3

or

some random carriage returns are missing and names are incomplete


[[Structure]] StructureName="A group of 40 populations analyzed for MICROSAT" NbGroups=1 Group= { "Australia" "Benin" "Brazil" "Burkina Faso" "Cameroon" "China" "Colombia" "Comoros" "Costa Rica" "El Salvador" "French Guiana" "Guadeloupe" "Guatemala" "Guyana" "Honduras" "India" "Indonesia" "Ivory Coast" "Japan" "Kenya" "Madagascar" "Malaysia" "Martinique" "Mauritius" "Mayotte" "Mexico"

ew-Caledonia""Peru" "Philippines" "Reunion" "Rodrigues" "Seychelles"

outh Africa" witzerland""Taiwan" "Tanzania" "Thailand" "Trinidad" "Uganda"

SA"}

EricArcher commented 4 years ago

I've looked at the code for writeArlequin() and can't immediately see what would be causing this behavior. It looks like an issue with the write() function that the code uses to write the lines of the text file. This is a base function, so I can't understand how it would be malfunctioning this way. Can you tell me what system you're using. If you could try to make a reprex, that would help tremendously.

As for writing haploid microsatellites, I'll need more information as to how you format this data. Would you open a separate issue for this so we can keep track of it independently?

Gillous commented 4 years ago

Thanks a lot Eric,

I will make a separate issue for the haploid microsat data.

I am under Windows 10 b1909; RStudio v1.3.959; R v4.0.2

Here is my code:

library(ade4) library(adegenet) library(poppr) library(strataG) arlequinWrite(genind2gtypes(vn14GEOpop), file = "vn14GPSpop.arp")

In enclosure: my Genind object. vn14GEOpop.zip

my modified function for this object (just changed in Line 13 the if(getPloidy(g)) > 1 to if(getPloidy(g) == 1); and in Line 44 if(getPloidy(g) == 1) { # Sequences to if(getPloidy(g) == 2) { # Sequences arlequinWrite2MICROSAT.zip

Thanks in advance!

Gilles

Gillous commented 4 years ago

Hi Eric,

Did you have a chance to take a look at the issue relating to the formatting output?

Best wishes,

Gilles

EricArcher commented 4 years ago

I am slogging through this and a few other issues this week. The plan is to have them done by the end of the week. I'll close the issues out once the fixes are pushed to the repo.

Cheers, e.

EricArcher commented 4 years ago

I can't reproduce this formatting issue, nor can I see what might be causing it. The file writing is being done by the base::write() function. The behavior implies that something is interrupting the writing of random lines when this function is called. Is it still happening and if so, can you give me your machine's specifications? When it happens are other processes running in the background?

Gillous commented 4 years ago

Hi Eric!

I tried with your updated version of your package including the switch for Haploid data, and everything seems back to normal: file formatting is working and ARLEQUIN is happy with it!

Don't really know if updating your package and/or some base package in R has done it, but I am really happy that works.

Thanks a lot for your time and for building such great package!

Cheers!