NCHlab / RTPEA

Retroelement Protein Expression Atlas
http://www.rtpea.com
1 stars 0 forks source link

JSON structure #26

Closed NCHlab closed 5 years ago

NCHlab commented 6 years ago

We need to re-vamp the Json structure as it isn't formatted great.

Based on comments from online users:

After seeing your data as JSON I'd vote that you work on cleaning up the formatting of the data your feeding RT seems like the samples are unnecessarily nested as an object within an array on top of that, their index number is set as a key, which complicates things I'd probably implement a map function to format the data in a more digestible manner Otherwise you're just going to be fighting your poorly formatted data all day long and it will not be fun to make changes and develop your React Table when you have id's set as keys instead of values it makes it a huge pain to do various things

NCHlab commented 6 years ago

New JSON to follow using this format Need to replace Snumber with a better name that represents the sample number

{ "_id" : ObjectId("5afb75518b19647dea6232c5"), "PXD" : "PXD1402", "study" : "kinases", "disease" : "diseased", "sample" : [ { "Snumber" : 1, "replicate" : 6.0, "file_name" : "name.mgf; name2.mgf", "phenotype" : "unknown", "tissue_type" : "EIUSMOD", "ORF1p" : { "confidence" : 0.0 }, "ORF2p" : { "confidence" : 44.0, "total_peptides_identified" : 19.0, "unique_peptides" : 16.0, "validated_psm_matches" : 13.0, "loci" : "chr10:109812438-109818457 5'pad=0 3'pad=0 strand=-", "PTMs" : { "identified" : "yes", "fixed" : "no", "variable" : "Oxidation of Methionine" } }, "ORF0" : { "confidence" : 5.0 }, "ORF1p_variants" : 26.0, "ORF2p_variants" : 27.0, "HERV-K" : { "confidence" : 6.0, "total_peptides_identified" : 49.0, "unique_peptides" : 49.0, "validated_psm_matches" : 31.0, "PTMs" : { "identified" : "no" } }, "HERV-A" : { "confidence" : 0.0 }, "HERV-V" : { "confidence" : 11.0 } }, { "Snumber" : 2, "replicate" : 8.0, "file_name" : "name5.mgf; name3.mgf", "phenotype" : "unknown", "tissue_type" : "ET", "ORF1p" : { "confidence" : 47.0 }, "ORF2p" : { "confidence" : 8.0, "total_peptides_identified" : 21.0, "unique_peptides" : 45.0, "validated_psm_matches" : 46.0, "loci" : "chr10:109812438-109818457 5'pad=0 3'pad=0 strand=-", "PTMs" : { "identified" : "yes", "fixed" : "no", "variable" : "Oxidation of Methionine" } } } ] }

NCHlab commented 6 years ago

Working JSON file + generator file uploaded to Dropbox + here JSON NEW FORMAT GENERATOR FILE.txt JSON new format.txt

Nazrath10R commented 6 years ago

Ok, so this is adaptable so far

If a single sample contains several variants, how do you want it as? eg. Snumber 1 has variants with variant name "A" with confidence x, name "B" with confidence y and variant name "C" with confidence z

nested array inside?

NCHlab commented 6 years ago

{ "_id" : ObjectId("5b0770827ad1360027d244f4"), "PXD" : "PXD2257", "study" : "species", "disease" : "healthy", "sample" : [ { "Snumber" : 1.0, "replicate" : 6.0, "file_name" : "name.mgf; name2.mgf", "phenotype" : "unknown", "tissue_type" : "AD", "variant_A": 25, "variant_B": 25, "variant_C": 25, "ORF1p" : { "confidence" : 0.0 },

NCHlab commented 6 years ago

or do

"sample" : [ { "Variant" : 1A, "replicate" : 6.0, "file_name" : "name.mgf; name2.mgf", "phenotype" : "unknown", "tissue_type" : "AD", "ORF1p" : { "confidence" : 0.0 },

Nazrath10R commented 6 years ago

Ok. Let's see how that comes out

Nazrath10R commented 6 years ago

This is the script I was working on the last few days to get the output towards our format. It's a very preliminary script and not well annotated, but gets the job done for now. However, it only displays the variants as two arrays (one for names and one for confidence scores), so that's something you may want to change or adapt based on what suits you. Feel free to edit or use a different language to do it. You could also just import the jSONs produced into somewhere else and carry out the manipulations as you please, that's fully your choice. The script, the output table and example jSON are all in our DropBox folder (just change path and you should be able to run it). Let me know if any of it is unclear at all!

https://github.com/NCHlab/RTPEA/commit/9b84c6786f1d53b3e70f96044f9cbf6d1a4e24da

NCHlab commented 6 years ago

Python Script created and fixeds R output to correct Json Format

(rename to .py) Fix_Json_ORF.txt