Open sdash-github opened 4 years ago
Will take the approach of substituting space with '_' and reload into gigwa.
data/public/Arachis_hypogaea/arahy.gnm1.div.LZ50]$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | wc -l
1151
Total 1151 headings for individuals.
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | grep '[[:space:]]' | wc -l
544 # individuals with space in the name
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | grep -v '[[:space:]]' | wc -l
607 # without space in name
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | tr ' ' '_' | grep '[[:space:]]' | wc -l
0
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | tr ' ' '_' | grep -v '[[:space:]]' | wc -l
1151 # space to '_' works
Backed up then:
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | sed '1s/ /\_/g' | gzip > arahy.gnm1.div.LZ50.snp_chip.SpaceRemoved.hmp.gz
Original file removed and modified file renamed to original.
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | grep '[[:space:]]' | wc -l
0 # no header name with space
$ zcat arahy.gnm1.div.LZ50.snp_chip.hmp.gz | head -1 | perl -pe 's/\t/\n/g' | tail +12 | grep -v '[[:space:]]' | wc -l
1151
Now file ready for loading into gigwa
Loaded in PB-stage with Ethy after deleting the previous one. Now 1145 individuals loaded into gigwa db out of 1151 in file (earlier it was 913). Instructions in PB project notes doc from Ethy shared with us.
TO DO: Email to them when data in production after rollover.
Updated in DS. TO DO: Need to convert .hmp to flapjack format.
@adf-ncgr: Hi Andrew, From my quick reading it sounds like flapjack format is different from the VCF format? Do I need the flapjack installed to convert our .hmp file to flapjack?
flapjack can import a few formats, but I'm not sure hmp is one of them. I have a vcf converter that could probably be tweaked for this purpose, though. You probably should get flapjack installed on your laptop, though, as it will be needed to produce the flapjack format (which is really a sqlite3 db file, representing a sort of session file for flapjack). let's discuss some more when we meet tomorrow, as I think there may be some minor problems with the files (my laptop is really struggling to load the flapjack file that's in there; also, the hmp file does not seem to actually refer to arahy.gnm1 as the folder would imply)
African breeding lines genotype data has been loaded into PB gigwa (https://peanutbase.org/data/public/Arachis_hypogaea/arahy.gnm1.div.LZ50/), but:
On 2019/12/13 10:02 AM, Jean Francois Rami wrote:
So the 1151 vs. 913 individuals need to be sorted out at PB gigwa