genenetwork / genenetwork2

GeneNetwork (2nd generation)
http://gn2.genenetwork.org/
GNU Affero General Public License v3.0
34 stars 24 forks source link

Small version GN database (<=2GB) #32

Closed ghost closed 7 years ago

ghost commented 9 years ago

Done.

https://github.com/genenetwork/gndatabase/blob/master/db_webqtl_small.zip

compressed: 512MB uncompressed: 1.3GB

pjotrp commented 9 years ago

Sorry, I think we ought to move to S3, unless someone tells us how to download this file ;)

lomereiter commented 9 years ago

I second Pjotr's request. Even though I installed git-lfs and found the git lfs smudge command, it didn't help - the response is '403 Forbidden'. Another advantage of S3 over Github LFS servers is fairer pricing..

pjotrp commented 9 years ago

We are uploading to S3. Kinda surprised - even for beta I expect better from github

ghost commented 9 years ago

I have put it onto Amazon S3. https://s3.amazonaws.com/genenetwork2/db_webqtl_small.zip

lomereiter commented 9 years ago

Thanks Lei. It would be good to attach a README with instructions. The procedure I used is: 1) create an empty db_webqtl_s database from mysql console 2) copy files from the extracted db_webqtl_s dir into /var/lib/mysql/db_webqtl_s 3) set correct permissions (for me it was chown mysql:mysql and chmod 660 on /var/lib/mysql/db_webqtl_s/*)

I also wish there were included a dataset with case attributes:

> select * from CaseAttributeXRef, ProbeSetFreeze 
>          where CaseAttributeXRef.ProbeSetFreezeId = ProbeSetFreeze.Id;
Empty set (0.04 sec)
pjotrp commented 9 years ago

The README can go into the GN2 tree (root level) in INSTALL.md.

Case attributes are required.

lomereiter commented 9 years ago

I also have a request to have at least one example dataset for each DataScale in the test database. Currently select * from ProbeSetFreeze; returns just two rows, and for both DataScale is log2.

ghost commented 9 years ago

Fixed a bug in the small database. https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip

DannyArends commented 9 years ago

Got it working now, and can search for traits in the dataset: Hippocampus Consortium M430v2 (Jun06)

However I do get an error when I try to run any of the different mapping tools:

  Marker regression line 78
  self.markers = dataset.group.get_markers()
  Error: no JSON object could be decoded

Is this due to marker data being missing ?

Additionally I get errors on:

Can we add those 2 missing tables to the zip file ?

ghost commented 9 years ago

Added tables: db_webqtl_s.Docs db_webqtl_s.News

https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip

ghost commented 9 years ago

Download https://s3.amazonaws.com/genenetwork2/db_webqtl_s.zip, and then unzip it chown -R mysql:mysql db_webqtl_s/ chmod 700 db_webqtl_s/ chmod 660 db_webqtl_s/* restart MySQL service

DannyArends commented 8 years ago

Thanks, seems to work...

Could we add the WGCNA example dataset to the genenetwork database (and the small subset) ?

Then I can use that as a test dataset for WGCNA integration in GN2 Additionally this might be nice for future workshops, since people can then see how to use WGCNA in GN2 compared to using it in R.

The example dataset is at: http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-Data.zip

We do however need to reformat it into GN2 structure.

robwwilliams commented 8 years ago

Dear Danny, Lei and team,

This should be easy. That data set (and all other data sets for this cross) are already in the full GN1 database. In fact, I made corrections to this database recently (errors in sex assignment). GN1 has Phenotypes, genotypes, and four gene expression data sets (including the liver data set). The liver data set is presented as Male, Female, and Combined.

[image: Inline image 1]

Here is a piece of the CSV file with the case IDs used in the Horvath example:

Mice Number Mouse_ID Strain sex DOB parents Western_Diet Sac_Date weight_g length_cm ab_fat other_fat total_fat comments 100xfat_weight Trigly Total_Chol HDL_Chol UC FFA Glucose LDL_plus_VLDL MCP_1_phys Insulin_ug_l Glucose_Insulin Leptin_pg_ml Adiponectin Aortic lesions Note Aneurysm Aortic_cal_M Aortic_cal_L CoronaryArtery_Cal Myocardial_cal BMD_all_limbs BMD_femurs_only 1 F2_290 290 306-4 BxH ApoE-/-, F2 2 3/22/02 229232 5/14/02 9/11/02 36.9 9.9 2.53 2.26 4.79 NA 12.98102981 53 1167 50 484 121 437 1117 175.85 924 0.472943723 245462 11.274 496250 NA 16 0 17 0 0 NA NA 2 F2_291 291 307-1 BxH ApoE-/-, F2 2 3/22/02 232 5/14/02 9/11/02 48.5 10.7 2.9 2.97 5.87 NA 12.10309278 61 1230 32 592 173 572 1198 92.43 5781 0.098944819 84420.88 7.099 NA NA 16 4 0 2 4 0.0548 0.0773 3 F2_292 292 307-2 BxH ApoE-/-, F2 1 3/22/02 232 5/14/02 9/11/02 45.7 10.4 1.04 2.31 3.35 NA 7.330415755 41 1285 81 460 96 497 1204 196.398 2074 0.239633558 105889.76 5.795 218500 NA 0 0 11 0 0 0.0554 0.08065 4 F2_293 293 307-3 BxH ApoE-/-, F2 1 3/22/02 232 5/14/02 9/11/02 50.3 10.9 0.91 1.89 2.8 NA 5.566600398 271 1299 64 476 122 553 1235 97.466 11874 0.046572343 100398.68 5.495 61250 NA 0 0 0 0 236 0.0597 0.0868 5 F2_294 294 307-4 BxH ApoE-/-, F2 1 3/22/02 232 5/14/02 9/11/02 44.8 9.8 1.22 2.47 3.69 NA 8.236607143 114 1410 50 516 118 535 1360 95.452 9181 0.058272519 130846.3 6.868 243750 NA 12 10 0 0 0 NA NA 6 F2_295 295 308-1 BxH ApoE-/-, F2 1 3/22/02 232 5/14/02 9/11/02 39.2 10.2 3.06 2.49 5.55 NA 14.15816327 72 1533 18 620 106 382 1515 144.27 485 0.787628866 75166.22 17.328 104250 NA 17 2 0 0 0 0.0557 0.077

On Fri, Sep 11, 2015 at 11:55 AM, Danny Arends notifications@github.com wrote:

Thanks, seems to work...

Could we add the WGCNA example dataset to genenetwork (and the small subset) ?

Then I can use that as a test dataset for WGCNA integration in GN2 Additionally this might be nice for future workshops, since people can then see how to use WGCNA in GN2 compared to using it in R.

The example dataset is at: http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-Data.zip

We do however need to reformat it into GN2 structure.

— Reply to this email directly or view it on GitHub https://github.com/genenetwork/genenetwork2/issues/32#issuecomment-139598808 .

Rob

Robert W. Williams, Ph.D. UT-ORNL Governor's Chair in Computational Genomics Chair, Department of Genetics, Genomics and Informatics University of Tennessee Health Science Center Room 501 855 Monroe Avenue, Memphis TN 38163 USA

Office 901 448-7018 CELL 901 604 4752 Office: 501 Wittenborg Building Department of Genetics: 71 Manassas St, Memphis TN 38163 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams

pjotrp commented 8 years ago

I have moved the test database to GNU Guix. A direct download is possible through http://files.genenetwork.org/raw_database/

pjotrp commented 8 years ago

@lyan6 can you document the steps you did to create this smaller database? Thanks!

ghost commented 8 years ago

Finished.

https://github.com/genenetwork/genenetwork/blob/master/web/webqtl/maintainance/gndb-shrink.sql

pjotrp commented 8 years ago

Thanks!

pjotrp commented 7 years ago

@lyan6 can we deploy the small database on Lily?

leiyan commented 7 years ago

I deployed a small GN database on Lily, and the db name is “db_webqtl_s”.

pjotrp commented 7 years ago

Thanks!