PMBio / limix-backup

http://pmbio.github.io/limix/
Apache License 2.0
45 stars 12 forks source link

limix_converter problem1 & dataset.getPhenotypes problem2 #22

Open JonLJ opened 8 years ago

JonLJ commented 8 years ago

Hi

PROBLEM1 I am having a problem following the 'loading files into LIMIX' tutorial, specifically with using limix_converter to convert my phenotype file 'phenotypes.csv into hdf5 format.

If I use:

limix_converter -O ./my_file.hdf5 -C ./phenotypes.csv

I obtain something like that:

/home/jon/anaconda2/lib/python2.7/site-packages/limix/io/conversion.py:78: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  C = pandas.io.parsers.read_csv(csv_file,sep=sep,header=None,index_col=False,*args,**kw_args)

On the other hand, if I use:

limix_converter -O ./my_file.hdf5 -C ./phenotypes.csv -D ,

in order to avoid this warning, I obtain:

......,21478,21479,21480,21481,21482,21483,21484,21485,21486,21487,21488,21489,21490,21491,21492,21493,21494,21495,21496,21497,21498,21499,21500,21501,21502,21503,21504,21505,21506,21507,21508,21509,21510,21511,21512,21513,21514,21515,21516,21517,21518,21519,21520,21521,21522,21523,21524,21525,21526,21527,21528,21529,21530,21531,21532,21533,21534,21535,21536,21537,21538,21539,21540,21541,21542,21543,21544,21545,21546,21547,21548,21549,21550,21551,21552,21553,21554,21555,21556,21557,21558,21559,21560,21561,21562,21563,21564,21565,21566,21567,21568,21569,21570,21571,21572,21573,21574,21575,21576,21577,21578,21579,21580,21581,21582,21583,21584,21585,21586,21587,21588,21589,21590,21591,21592,21593,21594,21595,21596,21597,21598,21599,21600,21601,21602,21603,21604,21605,21606,21607,21608,21609,21610,21611,21612,21613,21614,21615,21616,21617,21618,21619,21620,21621,21622,21623,21624,21625,21626,21627,21628,21629,21630,21631,21632,21633,21634,21635,21636,21637,21638,21639,21640,21641,21642,21643,21644,21645,21646,21647,21648,21649,21650,21651,21652,21653,21654,21655,21656,21657,21658,21659,21660,21661,21662,21663,21664,21665,21666,21667,21668,21669,21670,21671,21672,21673,21674,21675,21676,21677,21678,21679,21680,21681,21682,21683,21684,21685,21686,21687,21688,21689,21690,21691,21692,21693,21694) have mixed types. Specify dtype option on import or set low_memory=False.

In both cases, it seems to convert to hdf5 correctly. However, I do not if this last warning is due to the massive table of phenotypes I am using (I have around 21000 genes and 184 samples).

PROBLEM2 Anyway, once I made the conversion of the file, I do:

geno_reader  = gr.genotype_reader_tables('my_file.hdf5')
pheno_reader = phr.pheno_reader_tables('my_file.hdf5')
dataset = data.QTLData(geno_reader=geno_reader,pheno_reader=pheno_reader)

If I look at:

pheno_reader.pheno_matrix
pheno_reader.sample_ID
pheno_reader.phenotype_ID

Everything seems to be OK. However, when I do:

phenotypes,sample_idx=pheno_reader.getPhenotypes()

I have the following warning:

/home/jon/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/jon/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)

And 'phenotypes' is an empty dataframe, and only the column names are defined ([0 rows x 21694 columns]). I do not understand why, but it also happens to me when I use your sample file of phenotypes.

Many thanks, Jon

jeffhsu3 commented 8 years ago

Can you load your phenotypes csv in pandas fine? Do you have a header row in your phenotypes data? That might be causing the mixed type errors.