broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
972 stars 354 forks source link

read_json failes on valid json files #4519

Open antonkulaga opened 5 years ago

antonkulaga commented 5 years ago

read_json almost never work for me. Here is for instance a json that is totally valid but breaks read_json

{
  "id" : "GSM1698568",
  "gse" : [
    "GSE69360"
  ],
  "title" : "Biochain_Adult_Liver",
  "sampleType" : "SRA",
  "organism" : {
    "name" : "Homo sapiens",
    "taxid" : "9606"
  },
  "sequencer" : "Illumina HiSeq 2000",
  "characteristics" : {
    "number of donors" : "1",
    "age" : "64 years old",
    "tissue" : "Liver",
    "vendor" : "Biochain",
    "isolate" : "Lot no.: B510092",
    "gender" : "Male"
  },
  "library" : {
    "strategy" : "RNA-Seq",
    "selection" : "cDNA",
    "source" : "transcriptomic"
  },
  "extraction" : {
    "source" : "Biochain Adult Liver",
    "molecule" : "total RNA",
    "protocol" : "2 different fetal normal tissues and 6 different adult normal tissues were purchased from different sources (Agilent, Biochain and OriGene). The qualities of these total RNA were tested using the Agilent Bioanalyzer 2100 Eukaryote Total RNA Nano Series II. Only total RNAs with a RIN score of more than 7 were used for RNA-Seq library construction\nRibosomal RNA (rRNA) was removed from total RNA using the RiboMinus™ Eukaryote Kit for RNA-Seq from Ambion. The ribosomal RNA depleted RNA fraction is termed the RiboMinus™ RNA fraction and is enriched in polyadenylated (polyA) mRNA, non-polyadenylated RNA, pre-processed RNA, tRNA, and may also contain regulatory RNA molecules such as microRNA (miRNA) and short interfering RNA (siRNA), snRNA, and other RNA transcripts of yet unknown function. Ambion RiboMinus rRNA depletion was performed as described in the manufacturer’s protocol (Pub. Part no.: 100004590, Rev. date 2 December 2011) following the standard protocol.\nTruSeq RNA Sample Preparation was performed on the RiboMinus™ RNA fraction as described in the manufacturer’s protocol (Pub. Part no.: 15026495 Rev. F March 2014) following the low sample protocol.\nThe libraries were sequenced on Illumina’s HiSeq 2000 instrument following standard protocol.",
    "processing" : "Data quality check using fastQC version 0.11.2.\nAlignment of unpaired unstranded reads using STAR version 2.4.0.\nQuantification of transcripts and isoforms using RSEM version 1.2.21 using rsem-calculate-expression, both alignment and quantification was done using the STAR_RSEM.sh pipeline (https://github.com/ENCODE-DCC/long-rna-seq-pipeline/blob/master/DAC/STAR_RSEM.sh)\nThe programe featurecounts version 1.4.6-p2 from the SourceForge Subread package was used to produce a summary file of counts from all the alignement .bam files.\nThe summary file of counts (RNAseq.counts) was used to plot the multidimensional scaling plot using edgeR version 3.1.3.\nThe *.osc.gz files were loaded into the genome browser ZENBU and was used visualize the transcripts. Screen shots were captured.\nGenome_build: hg19 with Gencode V19 annotation\nSupplementary_files_format_and_content: .osc files are simple tab delimited files. They were generated by combining the isoform.results files outputed by RSEM with the gencode v19 .gtf file. It contains abundance measurements and transcript isoforms. It also contains metadata that is inputed into ZENBU.\nSupplementary_files_format_and_content: RNAseq.counts is a simple tab delimited file containing the counts for all the RNA-seq libraries for each gene (summary file of counts)."
  },
  "relations" : {
    "BioSample" : "https://www.ncbi.nlm.nih.gov/biosample/SAMN03610550",
    "SRA" : "https://www.ncbi.nlm.nih.gov/sra?term=SRX1020495"
  },
  "status" : {
    "submitted" : "May 29 2015",
    "updated" : "Jun 01 2015"
  },
  "runs" : [
    {
      "run" : {
        "Run" : "SRR2014238",
        "ReleaseDate" : "2015-05-25 05:44:11",
        "LoadDate" : "2015-05-25 05:38:29",
        "AssemblyName" : "",
        "download_path" : "https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRR/001967/SRR2014238",
        "Experiment" : "SRX1020495"
      },
      "stats" : {
        "spots" : 85220810,
        "bases" : 12953563120,
        "spots_with_mates" : 85220810,
        "avgLength" : 152,
        "size_MB" : 7790.0
      },
      "library" : {
        "LibraryName" : "Biochain_Adult_Liver",
        "LibraryStrategy" : "RNA-Seq",
        "LibrarySelection" : "other",
        "LibrarySource" : "TRANSCRIPTOMIC",
        "LibraryLayout" : "PAIRED"
      },
      "sample" : {
        "Platform" : "ILLUMINA",
        "Model" : "Illumina HiSeq 2000",
        "SRAStudy" : "SRP058036",
        "BioProject" : "PRJNA283012",
        "Study_Pubmed_id" : "",
        "ProjectID" : "283012",
        "Sample" : "SRS931038",
        "BioSample" : "SAMN03610550",
        "SampleType" : "simple",
        "TaxID" : "9606",
        "ScientificName" : "Homo sapiens",
        "SampleName" : "Biochain_Adult_Liver"
      },
      "subject" : {
        "Subject_ID" : "",
        "Sex" : "male",
        "Disease" : "",
        "Tumor" : "no",
        "Affection_Status" : "",
        "Analyte_Type" : "",
        "Histological_Type" : "",
        "Body_Site" : ""
      },
      "other" : {
        "InsertSize" : "158",
        "InsertDev" : "41",
        "g1k_pop_code" : "",
        "source" : "",
        "g1k_analysis_group" : "",
        "CenterName" : "CANCER SCIENCE INSTITUTE OF SINGAPORE",
        "Submission" : "SRA266153",
        "dbgap_study_accession" : "",
        "Consent" : "public",
        "RunHash" : "0189DDD0D225B2E8DEA03FC1EEFCB0F5",
        "ReadHash" : "98D4DE007275783AF1596BEDD6502C11"
      }
    }
  ]
}
aednichols commented 5 years ago

Do you happen to have the resulting error handy?