Open sdash-github opened 1 year ago
just set up a test gigwa2 instance at: http://dev.lis.ncgr.org:50053/gigwa in case you want to start trying to populate a database with an updated version of dataset which needed the naming changes. I changed the admin password from the default but will send it to you.
Thanks.
On 2022/9/15 4:21 PM, adf-ncgr wrote:
just set up a test gigwa2 instance at: http://dev.lis.ncgr.org:50053/gigwa in case you want to start trying to populate a database with an updated version of dataset which needed the naming changes. I changed the admin password from the default but will send it to you.
— Reply to this email directly, view it on GitHub https://github.com/PeanutBase/jekyll-peanutbase/issues/17#issuecomment-1248644242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4A46ZCAVUHNHTJN7DEY3DV6OHMNANCNFSM6AAAAAAQICCUZ4. You are receiving this because you authored the thread.Message ID: @.***>
2.1 I probably already have a script that could handle the renaming if you want me to take a look at it.
Attempting minicore dataset as a pilot. Verifying datastore and export versions have same num of rows in vcf file. v2/Arachis/hypogaea/diversity/aradu1_araip1.gnm1.div.Otyama_Wilkey_2019] arahy.aradu1_araip1.gnm1.div.Otyama_Wilkey_2019.snp_chip.vcf.gz has 15897 rows(grep -v "^#")
Exported from current peanutbase giwa sdmilam/projects/PB-NCGR/GigwaReloading202209/US_Peanut_MiniCore15897variants104individuals.vcf 15897
So should use DS version file for standard and consistency.
Upload went well. Needs some more meta info like chip name, etc. for completeness.
Unable to edit meta-information like adding more text to description, chip name, etc. So, should delete database and reload after collecting all info. Neither found in doc or in the their publication.
Now all three datasets are loaded: Core, Minicore and the all encompassing African lines AG_4057_14471. Working dir: (SDash) sdmilam/projects/PB-NCGR/GigwaReloading202209
Forgot the Clevenger_Korani_2018.snp_chip dataset of African lines. Now added with description.
Peggy pointed out that she sees numbers instead of genotype names. The spreadsheet they sent has numbers instead of genoype names in many many rows. ex rows 1273 -1296 : a550846-4390129-041321-033_B01.CEL 139915 a550846-4390129-041321-033_B02.CEL 274253 ... ... a550846-4390129-041321-033_B23.CEL 185633 a550846-4390129-041321-033_B24.CEL 270907
The Gigwa, I think displays the individuals after sorting and hence the numbers (as genotype names in the spreadsheet) appear at the top and there are a lot of them.
Requested Peggy to send an updated spreadsheet.
On 2022/10/7 7:57 AM, Peggy Ozias-Akins wrote:
Hi Sudhansu,. Before we output another file, I want to be sure I understand the issue. The numbers in your email below are PI numbers although they should be preceded by PI. We can fix that. However, it looks like the script only imported genotype IDs that had numbers and not text. For example, rows 100-101 (and many others) in the spreadsheet show actual sample IDs that correspond to CEL file names. I don’t see a550846-4381366-061020-491_D06.CEL Ug-183_Oug-ICGV SM 02724 a550846-4381366-061020-491_D07.CEL Ug-78_Oug-S.4 X 99044 RED UG Regards, Peggy
My response after checking:
Hi Peggy, I looked for Ug-183_Oug-ICGV-SM-02724 and Ug-78_Oug-S.4-X-99044-RED-UG in the individuals dropdown lookup and found them. Please note that all the spaces have been converted to a '-' char because Gigwa shows erroneous behaviour in loading with IDs containing spaces. So, please look them up with hyphens in place of spaces. Please also note the spreadsheet has 4062 rows but the VCF file has 4057 '.CEL' columns which have been replaced with the genotype names. So five genotypes in the spreadsheet won't be found in this Gigwa dataset-- I don't know which five. Please let me know if there are other issues and I will address them. And thanks for looking at the Gigwa data thoroughly. Sudhansu
New spreadsheet is now available.
TO DO:
Generate file without spaces in names, generate VCF with new names, delete current dataset and reload.
Gigwa comparing haplotypes of two individuals: Sometimes the bottom panel doesn't update. Try showing adf if there is a trick.
AG_4057_14471 dataset reloaded after generating necessary reformed files. Work dir: sdmilam/projects/PB-NCGR/GigwaReloading202209 Invited Peggy to look at it. Will close issue after she responds.
Gigwa container need to be migrated to be housed within NCGR-LIS infra and linked to from PB-Jekyll site.