Closed agarwal closed 10 years ago
Regarding issue 43:
pgm1.bio.nyu.edu
from KarlThe raw data seems to be in /results/sn25080361/
(sn25080361
is the serial number) as of Fri, 31 May 2013 11:16:52 -0400 it contains 854 GB.
There, each run has a directory R_<date>_<name-of-the-run>
like R_2013_05_25_01_52_07_user_SN2-13-Justin_Monica_Seed1_2013-24-05
.
So far most runs respect the pattern user_SN2-<nb>-<more-namming>
(but not all like
R_2013_05_09_10_26_21_user_ION_CONTROL_TEST_RUN
or R_2013_01_15_03_18_52_user_2013-01-14_318_Suphir
— which should be SN2-3
).
The .dat
files (which the manual says that it's the raw data) are most often about 50 MB and there are a lot of them.
The run metadata seems to be in explog.txt
(written at the beginning of the run) and explog_final.txt
(at the end). Those are mostly Key : Value
text files.
Each run seems to also have a directory in /results/analysis/output/Home/
(315 GB as of Fri, 31 May 2013 11:28:47 -0400).
Depending on configuration, outputting FASTQ files should be explicitly asked from the Web Interface (or set to be run every time, c.f. p. 38). Right now it seems set up with Autorun. Once it's done they appear in
./$RUN_NAME/plugin_out/FastqCreator_out/
It seems to be a running Apache2, some Django, some PHP. some JSP (Apache catalina), PostGreSQL, and even some LaTeX to generate reports (like this one).
The /results/analysis
directory seems to be completely served by Apache, the are even some PHP files in the middle of the analyses.
find /results/analysis/output/Home/ -name "*.php" | wc -l
44
For example, when logged as a user I can see the file /results/analysis/output/Home/Auto_user_2013-01-14_318_Suphir_3_003/status.txt
at http://pgm1.bio.nyu.edu/output/Home/Auto_user_2013-01-14_318_Suphir_3_003/status.txt
To explore the PostgreSQL database:
psql iondb -U ion
some of it seems to be the default Django tables, but there are more custom ones.
gencore@bowery-0-3:/scratch/gencore/pgm-25080361/rsync_raw $ qsub script_rsync_raw.pbs
2632963.crunch.local
with
find . -type f -exec md5sum {} >> md5s_2013-10-14 \;
#use "topfind";;
#thread;;
#require "core";;
open Core.Std
let () =
let file1 = "md5s_2013-10-14" in
let file2 = "md5s_2013-10-14-torrent-server" in
let map_of file =
let open In_channel in
with_file file ~f:(fun ic ->
fold_lines ~init:String.Map.empty ic ~f:(fun map line ->
Scanf.sscanf line "%s %s" (fun data key ->
Map.add map ~key ~data)
))
in
let say fmt = ksprintf (eprintf "* %s\n%!") fmt in
let go map1 map2 =
Map.iter map1 (fun ~key ~data ->
match Map.find map2 key with
| None -> say "file %s not found" key
| Some s when s = data -> ()
| Some s -> say "file %s map1: %S map2: %S" key data s)
in
let map1 = map_of file1 in
let map2 = map_of file2 in
say "iter map1 trying map2:";
go map1 map2;
say "iter map2 trying map1:";
go map2 map1;
say "Done."
$ ocaml compare_md5s.ml
* iter map1 trying map2:
* file ./R_2013_10_09_04_20_19_user_SN2-19/.acq_0598.dat.fC9sF1 not found
* file ./md5s_2013-10-14 not found
* file ./rsync_raw/rsync_raw.stderr not found
* file ./rsync_raw/rsync_raw.stdout not found
* file ./rsync_raw/script_rsync_raw.pbs not found
* iter map2 trying map1:
* Done.
Understand the directory structure of the PGM, and create a script to transfer each run's data to Bowery.