Closed teatree1212 closed 8 years ago
Is number of pictures per sample constant? Is it "reasonably" small number (like max 3 pictures per sample)?
On the other hand - perhaps they can submit all images related to the same sample to a single "directory" inside CyVerse, so we can link to the entire directory for a given sample/PSU?
In the current case it will be 3 pictures per sample, but I don't know hoe many pictures would be possible. Time series- produced pictures could be many more.
For now, all pictures will be stored in one single folder per project in CyVerse. Apparently this makes the retrieval of pictures quicker.
If you think that having subdirectories for each sample, I will try and argue for that model in our next meeting.
would it be possible to add another column to the trial template called "comments" ( and connect it to the PlantScoringUnits.comments field in the database) which we then use to record the picture names ?
@teatree1212 A couple of questions:
For now they would just submit the names of the files s drop separated by semicolon in that column. CyVerse, the location they will submit the pictures to will be the default location to submit pictures. In the future , this can be additional development, the image submission should happen via bip but is sent to and saved in CyVerse.
An image would be handled like a trait, so it should be a new columns in trait_scored
Do you mean a single image is a "score_value" of sorts? So, these three images per PSU are like values for three different (though similar) trait descriptors, only recorded visually, not as a number?
This may actually be the best way to record them, you are right. If you permit strings, that will be no problem then (for now, with just the image names in the columns) and the cross-linking could be done afterwards. But how should in the future the images be displayed when in such a table format and together with numerical measurements in the same table? What are your thoughts?
We were talking with @wjurkowski today and he explained to me, what I didn't know, that these images are actually used to measure trait scores (e.g. the surface of a leaf - through image processing software). Therefore they should not be treated as trait scores, but rather as raw/source data, that was used to obtain specific trait scores.
@teatree1212 Annemarie - it this description of the issue correct? Are those pictures directly related to a specific trait score and therefore could be treated as a "source" for it?
yes, this is correct. It is reflected in the trait definitions I sent you in one file:
"RootReader2D (RR2D, Clark et al., 2013) was used to analyse images. First, a ‘batch process’ was carried out which automatically ‘thresholds’, ‘skeletonises’ and ‘builds segments’ of all images in bulk. Then, the the root system of individual images was measured by placing a marker at the base and tip of the primary root. From these markers, RR2D automatically calculates primary root length (PRL), lateral root length(LRL) of all laterals and lateral root number (LRN)."
So the traits primary root length, lateral root length and lateral root number are calculated from the pictures. For one PSU, there are hence these calculated values for the traits above and apparently several pictures associated with the values. I first thought they could go into the "trait_scores.comments" column and then be cross-linked with the pictures, so people can access the raw data associated with those very scores.
But in some way they are "Score_values" themselves, because they are just "measurements" (the images), which have been processed into other "measurements" ( the corresponding values for each trait, like PRL), which is why I liked your idea of having them represented as score values. But maybe, as there will be no analysis tool doing such analysis on the images, maybe the images just have to be considered an intermediate solution in the measurement process of the final values which get saved in the BIP. Maybe we should treat them like sequence data and just cross-link the name in the trait_scores.comments ( #502 ) with the respective image in Cyverse.
?
Having them as actual scores has this merit that other people can submit only pictures, if they want to (for instance if they do not plan/are not able to measure any numerical values out of these pictures).
Assuming we go this way, will we register each image as a separate trait? Or in groups: one trait for the single root image, second trait for the three leaf images?
Anyway I have just tested it and comma-separated string values for TraitScore.score_value
are acceptable. So, in case there are 2 images, uploaded to CyVerse, which represent a single score, you would submit something like:
tonikazic/field_phenotyping_repo/phenotypes/IMG_1403.JPG,tonikazic/field_phenotyping_repo/phenotypes/IMG_1409.JPG
Each value is a relative URL, valid for the http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/
folder. I can develop a small extension of the data tables presentation which will detect these relative "paths", and turn them into web links pointing at exact CyVerse files. To make things more compact, I can name these links "image_1", "image_2", like here:
(don't mind the trait name - just testing).
Should I go ahead with that?
I like your solution for future image submissions and it will be valuable for this "image raw data" and other "image data".
1) But let's stick with @wjurkowski s way of defining the pictures as "raw data", as these pictures are used to calculate traits. and not having them submitted as actual traits.
Maybe you can create an additional "trait"( but not really a trait..) that is called "images" and we select this trait and add the picture names in a comma separated fashion in there. This "trait" would always need to be used to call images that were used as raw data for the traits that were submitted to BIP. Here is a definition of this universal image "trait", which you can import into BIP. Trait__image_data.xlsx There, it will then always be the case that the image names will be associated with an image stored in CyVerse. ---so this is "raw image data"
(the next paragraph is probably image-development related: )
2) ---If the images themselves are the actual "trait"- again, this would need to be defined in the trait descriptor, these images would need to be submitted in the way data gets submitted to the BIP and the way you were thinking of adding these images to BIP. In the future, these images then should be made visible as "the traits" rather than just displaying their names. This is subject to future development and discussion.
HOWEVER: A after feedback: even though we are not storing image names in this column, a "comments" column should be added to the trait_scoring_template. This is subject to another issue which I will open now.
@teatree1212 Thanks for these ideas, I'll consider them. But before that, please tell me what exactly do you need for the early August submissions, so I can add it right now, and what is not immediately needed and could be a subject to the new settlement. I am absent the entire next week (holidays), by the way.
This paragraph is what i envision for this submission, sorry I didnt make it clear enough:Maybe you can create an additional "trait"( but not really a trait..) that is called "images" and we select this trait and add the picture names in a comma separated fashion in there. This "trait" would always need to be used to call images that were used as raw data for the traits that were submitted to BIP. Here is a definition of this universal image "trait", which you can import into BIP. Trait__image_data.xlsx There, it will then always be the case that the image names will be associated with an image stored in CyVerse. ---so this is "raw image data"
I can create and provide such a TraitDescriptor to store images. However, can't we relate that "image" TraitDescriptor with the real Trait it represents? Which would be "Lateral root length" for root images, and something related to leaves for the leaves pictures?
it sounds possible. However, with these images, you can make up many more traits, which haven't been measured on them yet. e.g. these images have been used to calculate several trait descriptore: lateral root length that you mentioned above, and also lateral root number and primary root length. However, they haven't been used to calculate..( making this up now) root colour or ..root diameter. They would then have to be linked to them afterwards by the user. this could be possible by asking making the user select what traits the "image" trait ( the images) was used for. What do you think?
@kammerer , I think Nuanda has not imported the "Image" trait yet, which is needed for our submission on the 4th. Trait_descriptors have been uploaded by you in a similar fashion with similar headers before. Maybe there is a script to do so somewhere.
Please create a new trait descriptor called "Images" with the definitions given in the file Trait__image_data.xlsx mentioned above. Information regarding this trait needs to be parsed into the database tables according to the headers i provided.
is it possible to do that until the 4th? Let me know and let me know once it is done.
@teatree1212 I think we really want to have a clear picture there, before the implementation. Let me think for a minute or two and I'll post another comment here.
@kammerer This time we might need a simple user-provided value sanitization, I may ask you for this later.
1) I'm going to create a Trait table record for the "images pseudo-trait". I will not create TraitDescriptors (sorry for not being precise before) as you can do that any way you like during the submission (manual - 2nd step, or automatic - through an API call) - so it's not needed to do that "behind the scenes".
-I cannot create a new trait descriptor name as the submission currently draws from the ontologies only, there is no "OR" feature for that:
-What header is the user supposed to use now for the submission of the /URL/image names?
2)_ I will also alter the data tables presentation for trait scoring in the Browse section tables for such values (comma-separated CyVerse paths) - so a user can click on them and be transported to CyVerse file browser.
-great.
3) Please, do not use full CyVerse URLs for the trait scores, only the "paths" (like an example in my comment above, 12 days ago). -understood.
You can (or you will be able to, when I finish and we deploy the change). What you see in the "Trait" select field is the content of the Traits table. As I wrote in progress reports 14 and 15, this table was created based on the TO, but then it was extended with:
All these are available in the "Trait" field of the New Trait Descriptor form. Now, I am going to add a new 'images' Trait there as well.
okay. great. again, the trait is described in the file above. If you feel the description needs extension, go ahed and edit(:
@Nuanda
sorry for having to open this again, but when I looked into example data that was assigned a DOI in CyVerse, I saw that the link is different to yours.
In fact, after following this the link beneath the "Example" heading on this website, I think we can assume that the url will change to this address once the data(our images) for which a DOI has been requested has been curated : cyverse.org/browse/iplant/home/shared/commonsrepo/curated/ foldername_in_certain_nomeclature_in_which_all_pictures_arestored /IMG_XYZ.jpg
these folders :"foldername_in_certain_nomeclature_in_which_all_pictures_are_stored>"are already created by me and the final address for the images will be:
1) cyverse.org/browse/iplant/home/shared/commons_repo/curated/Thomas_seedlingRootPhenotypingImages_2016/IMG_XYZ.jpg
2) cyverse.org/browse/iplant/home/shared/commons_repo/curated/Alcock_leafPhenotypingImages_2016/IMG_XYZ.jpg
note to 2), this upload has not yet been confirmed but I have prepared everything nontheless so that it is ready when it is ready.
But before, the data will probably actually be stored in bip's home directory. I noticed that I don't have permissions to store things in the shared folder. So that path will be:
1) ...iplant/home/bip/Thomas_seedlingRootPhenotypingImages_2016/IMG_XYZ.jpg
2) ...iplant/home/bip/Alcock_leafPhenotypingImages_2016/IMG_XYZ.jpg
which means that you would have to change the relative URL.
So I think what we can do is 1-either use two URLs ( if that is possible) 2-use the home/shared/commons_repo/curated/ URL, which is the final storage place. This means that the images will not be liked immediately to the names displayed in BIP. 3-use the /home/bip URL first and then switch to the home/shared/commons_repo/curated/ URL. This means, that the images will be immediately linked to and displayed in BIP.
if it is a lot of work, don't worry about it and use option 2. If option 1 is easy, use that. I think option 3 is unnecessary fiddling, even though I understand it would only mean to slightly change the query address after a while. I think they are fine with not seeing the images immediately, as long as they receive a DOI for it( which will be my task).
Is home/bip
CyVerse path accessible to everyone?
Also, I don't think the "cyverse.org/browse/iplant/home/shared/" URL work. If I use a (working) URL like that one:
and I change the server to cyverse.org, it is 404 :(.
hmh.. home/bip is not accessible to everyone, but only BIP. you would have to query using the BIP account access token. But once it is relocated to the /home/shared/commons_repo/, it should be open to everyone.
but.. so I got this URL from an example-website: http://mirrors.cyverse.org/browse/iplant/home/shared/commons_repo/curated/Duitama_rice_variation_2015
which probably doesn't reflect the API query address, I think they still work with iplantcollaborative as system "id" in the AGAVE API.
I located this folder tonikazic in the discovery environment. Did you create it? I somehow cannot create or move folders to the shared environment.
So sorry for the confusion, It means that your "iplantcollaborative" address is correct, but propably needs to be slightly altered in the end:
as navigating to /iplant/home/shared/commons_repo/curated, one sees multiple curated folders where other people's data types are stored. I assume this will be ( to be confirmed by Ramona- see email from Monday, 1 August 2016 at 16:57, espectrum cc'd in) the final destination for the DOI-assigned image data folder.
maybe the default address should probably be(?) :
http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/commons_repo/curated/
No, this is not my account. I simply browsed the repository online to find some images so I can test my code. I have no idea what is that but it shows plants :).
I will be available all day today.
But just to make things clear with all these comments of mine here: the final path will probably be: 1) http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/commons_repo/curated/Thomas_seedlingRootPhenotypingImages_2016/IMG_XYZ.jpg
-again, I don't have permission to edit or add any folders to the /home/shared/ environment, so I don't think we can post our Images there right away, but have to post them to our /home/bip/ folder, apply for a DOI and then the images will be moved/ copied(?) to the shared/commons_repo/curated folder buy someone who has the authority to do so.
I understand that you didn't create the the shared/tonikazic/field_phenotyping_repo/phenotypes/ yourself, as it was last modified in 2015 and it was an example.
So first, the initial address will be this, I am assuming, which is private, but accessible with an API token 1) ...iplant/home/bip/Thomas_seedlingRootPhenotypingImages_2016/IMG_XYZ.jpg
2) ...iplant/home/bip/Alcock_leafPhenotypingImages_2016/IMG_XYZ.jpg
And the final paths will be the paths shown above.
The 3 options comment " so I think what we can do" are still in discussion.
Currently we add this prefix to the trait score values:
http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/
So you would need to upload CSV with values like
commons_repo/curated/Alcock_leafPhenotypingImages_2016/IMG_XYZ.jpg
to make this work. Would you rather have me change it to:
http://mirrors.iplantcollaborative.org/browse/iplant/
(and you will add the home/shared/
or home/bip/
- your decision - part to the URL)?
is the home/bip/
not associated with restricted access, meaning that more work needs to be done from your side?
I think, I need to hear back from Ramona about the final destination of the files before I can take a decision on this. Therefore, just stick to:
http://mirrors.iplantcollaborative.org/browse/iplant/shared/
*edit I meant /home/
, but thanks for shortening.
and we will add the final destination at the day of submission. I suppose this was your original suggestion anyways- I apologise for the back and forth (:
Shortened it to:
http://mirrors.iplantcollaborative.org/browse/iplant/
a second issue about this root data, it has pictures and they would like to submit is somewhere. This is not BIPs problem at the moment, but it would be good to have some picture identifiers already submitted with the trial metadata.
Their pictures will be submitted to CyVerse, which will in the future be cross-linked to BIP. As this is not the case yet, it would be however good to have some sort of "picture name" column in the TraitScores table, with which we can later identify each picture stored in CyVerse. So while the people submit their design_factors, trait_scores, they also submit picture_identifiers ( which will be multiple per sample, so maybe comma separated).
Does this sound like something that can reasonnably quickly be implemented? Or if we used the API for the submission of this data, can we just agree on a column with text content, where we store this information to for now, which then can be repurposed?