UW-GAC / wgsaparsr

Code for parsing TOPMED variant annotation files produced by the WGSA annotation tool.
Other
5 stars 3 forks source link

freeze 5 #81

Closed rafet2005 closed 6 years ago

rafet2005 commented 6 years ago

for freeze 5 annotation, When running get_field or parse_to_file an error "First line of source doesn't look like a WGSA header" which is due to the first line has CHROM instead of chr.

Rafet Al-Tobasei

bheavner commented 6 years ago

Thanks - I think this is fixed in the dev branch, but I'm travelling, so have to ask for your patience - I will be able to test and push to the master branch tomorrow.

rafet2005 commented 6 years ago

Thank you. I remember Deepti address the Chromosome format in the annotation group "Chromosome format (1,2,3. . . or chr1,chr2,chr3. . . ): The current format is 1,2,3". "CHROM" was not one of the options. Do you have Perl script to parse the annotations file?

Thanks,

On Thu, May 31, 2018 at 8:32 AM, Ben Heavner notifications@github.com wrote:

Thanks - I think this is fixed in the dev branch, but I'm travelling, so have to ask for your patience - I will be able to test and push to the master branch tomorrow.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UW-GAC/wgsaparsr/issues/81#issuecomment-393531105, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac5ePvhhuAnY2zOEmF4e9HSfhfdJtNw6ks5t3_D7gaJpZM4UVB9N .

-- Rafet Al-Tobasei, Ph.D. Postdoctoral Scholar Dept. of Biostatistics Ryals Public Health Bldg 514D, 1665 University Blvd University of Alabama at Birmingham Birmingham, AL 35294-0022 Phone: (205) 934-2379 rtobasei@uab.edu

bheavner commented 6 years ago

No, the R code in this repository is what I use to parse the WGSA full annotation file to produce a simplified tab-separated file. Both the raw and simplified files are available on the exchange area. In house, we then import the simplified tsv to a database for querying and aggregating.

bheavner commented 6 years ago

I'll make the fix for get_field() and bump the version in a moment. Meanwhile, can you give me an example of how you're calling parse_to_file() for freeze 5? Here's how I did it:

library(wgsaparsr)

chromosome <- "chr22"

snv_source_file <- paste0("/projects/topmed/downloaded_data/WGSA_annotation/freeze5/",
                   "ann_11232017/freeze.5.",
                   chromosome,
                   ".pass_and_fail.sites.gz.snp.general.annotated20171121.gz")

snv_destination <- paste0("/scratch/fr_5_first_pass/", chromosome, "_snv.tsv")

dbnsfp_destination <- paste0("/scratch/fr_5_first_pass/", chromosome,
                             "_dbnsfp.tsv")

#config <- load_config(wgsaparsr_example("fr_5_config.tsv"))
config <- paste0("/projects/topmed/variant_annotation/freeze_5/database/v0/",
                 "build/parsing_code/20180212_fr_5_config.tsv")

# parse snv and dbnsfp:
parse_to_file(source_file = snv_source_file,
              destination = snv_destination,
              dbnsfp_destination = dbnsfp_destination,
              config = config,
              freeze = 5,
              chunk_size = 1000,
              verbose = TRUE)
rafet2005 commented 6 years ago

Hi, Sorry for the late reply.

Maybe that's my problem. I use a command from 2017 workshop. parse_to_file("snp.tsv.gz", "parsed_snp.tsv", desired_columns, verbose = TRUE )

Rafet

On Fri, Jun 1, 2018 at 3:48 PM, Ben Heavner notifications@github.com wrote:

I'll make the fix for get_field() and bump the version in a moment. Meanwhile, can you give me an example of how you're calling parse_to_file() for freeze 5? Here's how I did it:

library(wgsaparsr)

chromosome <- "chr22"

snv_source_file <- paste0("/projects/topmed/downloaded_data/WGSA_annotation/freeze5/", "ann_11232017/freeze.5.", chromosome, ".pass_and_fail.sites.gz.snp.general.annotated20171121.gz")

snv_destination <- paste0("/scratch/fr_5_first_pass/", chromosome, "_snv.tsv")

dbnsfp_destination <- paste0("/scratch/fr_5_first_pass/", chromosome, "_dbnsfp.tsv")

config <- load_config(wgsaparsr_example("fr_5_config.tsv"))

config <- paste0("/projects/topmed/variant_annotation/freeze_5/database/v0/", "build/parsing_code/20180212_fr_5_config.tsv")

parse snv and dbnsfp:

parse_to_file(source_file = snv_source_file, destination = snv_destination, dbnsfp_destination = dbnsfp_destination, config = config, freeze = 5, chunk_size = 1000, verbose = TRUE)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UW-GAC/wgsaparsr/issues/81#issuecomment-394004999, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac5ePuDEmU58HBZKmk4Re3Ufn5h4od6Yks5t4aixgaJpZM4UVB9N .

-- Rafet Al-Tobasei, Ph.D. Postdoctoral Scholar Dept. of Biostatistics Ryals Public Health Bldg 514D, 1665 University Blvd University of Alabama at Birmingham Birmingham, AL 35294-0022 Phone: (205) 934-2379 rtobasei@uab.edu

bheavner commented 6 years ago

No problem - there was a big change between versions 4 and 5 with the introduction of the configuration file. Freeze 6 is coming soon, so there may be another change for that (but I hope less significant).

If you'd like to parse freeze 4 as in the workshop, you can always use the older version of wgsaparsr, or you can look into details of the configuration file with the documentation in ?wgsaparsr::load_config(). The file at wgsaparsr_example("fr_5_config.tsv") is what I used to parse freeze 5, so is authoritative at the moment. That example includes optional notes columns, for example.