HenrikBengtsson / affxparser

🔬 R package: This is the Bioconductor devel version of the affxparser package.
http://bioconductor.org/packages/devel/bioc/html/affxparser.html
7 stars 3 forks source link

readPgf(): Coerce certain header fields to integers #4

Closed HenrikBengtsson closed 9 years ago

HenrikBengtsson commented 9 years ago

Header fields of PGF files are currently read as character strings. However, some of these may be coerced to integers (e.g. num-cols and probesets). We should update readPgf() to coerce such header fields. However, note that they are optional, that is, we must not assume they exists.

EXAMPLE WITH:

> data <- readPgf("DroGene-1_0-st.pgf")
> str(data$header)
List of 15
 $ chip_type         : chr "DroGene-1_0-st"
 $ lib_set_name      : chr "DroGene-1_0-st"
 $ lib_set_version   : chr "r4"
 $ create_date       : chr "Mon Oct 15 16:34:36 PDT 2012"
 $ guid              : chr "28ac67b4-62f6-4028-dde0-5596fa61cd33"
 $ pgf_format_version: chr "1.0"
 $ num-cols          : chr "1190"
 $ num-rows          : chr "1190"
 $ probesets         : chr "176275"
 $ datalines         : chr "1813223"
 $ sequential        : chr "1"
 $ order             : chr "row_major"
 $ header0           : chr "probeset_id\ttype\tprobeset_name"
 $ header1           : chr "\tatom_id"
 $ header2           : chr "\t\tprobe_id\ttype\tgc_count\tprobe_length
   \tinterrogation_position\tprobe_sequence"

EXAMPLE WITHOUT:

> data <- readPgf("HuGene-1_0-st-v1.r4.pgf")
> str(data$header)
List of 9
 $ chip_type         : chr "HuGene-1_0-st-v1"
 $ pgf_format_version: chr "1.0"
 $ lib_set_name      : chr "HuGene-1_0-st-v1"
 $ lib_set_version   : chr "r4"
 $ guid              : chr "0000050091-1228862702-0302010399-1388352192-14365709
85"
 $ create_date       : chr "Tue Dec  9 14:45:02 PST 2008"
 $ header0           : chr "probeset_id\ttype"
 $ header1           : chr "\tatom_id"
 $ header2           : chr "\t\tprobe_id\ttype\tgc_count\tprobe_length
  \tinterrogation_position\tprobe_sequence"
HenrikBengtsson commented 9 years ago

Done in branch feature/readPgf-header-coercion (commit a33c3a1):

> data <- readPgf("DroGene-1_0-st.pgf")
> str(data$header)
List of 15
 $ chip_type         : chr "DroGene-1_0-st"
 $ lib_set_name      : chr "DroGene-1_0-st"
 $ lib_set_version   : chr "r4"
 $ create_date       : chr "Mon Oct 15 16:34:36 PDT 2012"
 $ guid              : chr "28ac67b4-62f6-4028-dde0-5596fa61cd33"
 $ pgf_format_version: chr "1.0"
 $ num-cols          : int 1190
 $ num-rows          : int 1190
 $ probesets         : int 176275
 $ datalines         : int 1813223
 $ sequential        : chr "1"
 $ order             : chr "row_major"
 $ header0           : chr "probeset_id\ttype\tprobeset_name"
 $ header1           : chr "\tatom_id"
 $ header2           : chr "\t\tprobe_id\ttype\tgc_count\tprobe_length\tinterrogation_position\tprobe_sequence"