Meeting August 28th - Githubissues

suenjedt commented 10 years ago

metadata ingest will happen manually for the first 14 high level datasets, for the mid-term future we will enable automated ingestions from a controlled list of sources
CMS is preparing for a "guided tour"/how-to document which will accompany every dataset and analysis. This document will be the same for all primary data sets (may change later). But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run). However, the structure will be the same:
selection
validation
how to reuse
limitations

These texts are being prepared by CMS, with the support of Patricia. They should be linked (initially) on the right hand side of the individual records with a dedicated box. Patricia will investigate if parts of this information can be referenced in the metadata to enable the tailoured dataset specific display. This additional documentation will sit, however, on an additional page and should be exportable as a PDF. It should be a record by itself, get a DOI, incl. citation recommendation (Action on Patricia to prepare that).

all of the datasets get a disclaimer that Kati will provide, i.e. concerning quality assurance. Location on the record page to be decided, possibly at the bottom of the page
There will be a set of restricted files, not visible to the external users with trigger/selection details (Kati please correct the details here!)
there must be an export functionality for the 14 highlevel file names enabling an easy integration into the config files - this needs to include the root file name
A virtual image will be stored on the plattform: will become a standalone record and DOI

Ana's analysis

is derived from two high-level (primary) datasets [the same is the case for Tom's examples]
is available on github: a) exercise itself https://github.com/ayrodrig/OutreachExercise2010 b) the pattuples production https://github.com/ayrodrig/pattuples2010
Ana's code should become a record by itself, too - also with a DOI [following Zenodo's Github integration]
also these records will have their own "how to" in the box on the right [see 1-4 above]
there should be enough metadata to create such a record: authors lists the same

Overall tasks and next steps

set up Laura's design
set up of html-editing pages for additional info on github
prepare for additional information in separate menu so that we can prepare some nice additional documentation there
prepare the additional boxes on the right of a detailed record page
check export functionlities (see comment on titles above)
meeting beginning of next week for documentation sprint (with Achintya and Patricia)
meeting beginning of next week with Pamfilos for design sprint

UX/UI testing tasks

navigation on the portal
navigation from primary and reduced data
one task: can you reproduce the analysis? [is the user able to find all the related information, data, code, "how-to" for the particular analysis?]

Metadata related tasks

compile metadata for software
compile metdata for virtual image
populate the records for the 14 primary datsets
integrate Ana's analysis

katilp commented 10 years ago

Just a clarification for the second point: for the guided tour, to start with, this document will be the same for all primary data sets (may change later). But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run)

suenjedt commented 10 years ago

Thanks Kati, changed that :)

RaoOfPhysics commented 10 years ago

@pherterich, @suenjedt, @katilp: Please confirm a suitable time for the documentation sprint next week. Options for me: http://doodle.com/e5sgctgu5mxunuki#calendar

katilp commented 10 years ago

Further elaboration of the four information areas which should accompany each element on the portal:

1) where did this come 2) how was it validated 3) how to use it 4) limitations

For primary data set these would contain 1) trigger selections 2) general statement on the data validation (eventually, in the future, validation plots which will be needed if the the data or the software need to be migrated) 3) guided tour doc with explanations on the data content and on how to do an analysis) 4) whatever needed....

For derived data sets 1) code that was used to produced them starting from the primary data sets 2) eventually an expected result to which to compare after step 3) 3) pointer to application (event display, histogramming, analysis example code) and the instructions 4) whatever needed (i.e. physcis object selections may not be the offcial recommendations of CMS etc...)

For the VM image 1) some explanation of how the image was built (i.e. link to CernVM...) 2) Anssi's report 3) prerequisite text from https://twiki.cern.ch/twiki/bin/view/CMS/DPOAVMUserInstructions#Prerequisites 4) unsolved problems found by Anssi if any

For the CMSSW code example (i.e. those to produce the event display files, Ana's two levels) 1) statement that this code runs in CMSSW version N 2) eventually, a reference plot or a result or expected output from step 3) 3) instructions on how to run 4) whatever needed...

For the applications (i.e. histogramming, event display, else) 1) what are the underlying packages, tools 2) a reference plot/figure of after running step 3) 3) pointer to a source code and instructions on how to run (needed for "external" developers 4) whatever needed

RaoOfPhysics commented 10 years ago

@suenjedt, @katilp, @pherterich: We meet

on 2 September (Tuesday) at 15:00 in R1

(Unless you prefer a proper meeting room?)

TimSmithCH commented 10 years ago

In addition to the disclaimer, all data records should have clearly marked the copyright statement and licence for reuse

suenjedt commented 10 years ago

Indeed, the official label for CCZero, which is the one being used here (so far) is available here http://creativecommons.org/about/downloads

katilp commented 10 years ago

Do we already have an area for editing the Additional information text in github?

suenjedt commented 10 years ago

@tiborsimko : you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

tiborsimko commented 10 years ago

@suenjedt @katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

we could assign issues to different persons depending on who will deal with the issue at hand;
we could plan issues to different milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47 and #48.

tiborsimko commented 10 years ago

you mentioned this easy editing functionality for html stuff here we could use for the information texts. Could you point me/us to it so we can get started? Thanks!

Here are quick instructions:

Say you'd like to edit "Visualise Events" page that is here:

http://open-data-demo.cern.ch/visualise/events

You'd localise this page in the source code under base/templates directory, either by direct browsing of that place, or by searching for strings that occur on the web page, which will bring you here:

https://github.com/tiborsimko/open-data.cern.ch/blob/pu/invenio_opendata/base/templates/visualise_events.html

Now you click on Edit icon on the rhs which will open a basic file editor on GitHub. The editor will permit you to edit the page source (in HTML) say to copy/paste HTML text into the template.

Note that the GitHub editor will help you to edit the HTML, e.g. opening/closure of elements like <ul>...</ul>, but the "preview" button will not show you the page in action in any good format; for this one has to preview the page via the Invenio application. (*)

You save your edits and issue a pull request that we'd check, review, and deploy. (Note that issuing a pull request assumes that you first forked this repository in your own space; just use "Fork" button in the top right.)

See also various GitHub guides like:

(*) Otherwise it may be easier to edit page body in some easy-to-use markup format, such as reStructuredText, which would contain a simple preview. However for this we'd have to change the layout of the templates in the repository. Perhaps you can give current HTML-only version a try and see if it is OK with you?

katilp commented 10 years ago

This requires that the page exits: we would need the following areas then, for the sake of clarity I will make a separate issue with the list of pages that we think we need urgently

tiborsimko commented 10 years ago

This requires that the page exits: we would need the following areas then, for the sake of clarity I will make a separate issue with the list of pages that we think we need urgently

Yes, thanks. In order for them to appear on the site, we'd need to create corresponding templates and add some "glue" to the system. Basically the pages will all appear flattened here:

https://github.com/tiborsimko/open-data.cern.ch/tree/pu/invenio_opendata/base/templates

suenjedt commented 10 years ago

OK - Will do tomorrow hopefully. Sorry for our absence today - Proposal submission tomorrow.

From: Tibor Simko [notifications@github.com] Sent: 01 September 2014 15:46 To: tiborsimko/open-data.cern.ch Cc: Sunje Dallmeier-Tiessen Subject: Re: [open-data.cern.ch] Meeting August 28th (#41)

@suenjedthttps://github.com/suenjedt @katilphttps://github.com/katilp Thanks for the meeting write-up and further elaboration. It would be useful to turn these notes into a series of independent issues/tasks, so that:

we could assign issues to different persons depending on who will deal with the issue at hand;
we could plan issues to different milestoneshttps://github.com/tiborsimko/open-data.cern.ch/milestones to monitor time-based progress.

Do you think you could split these into independent issues according to the topic?

As an example, I started independent tasks for VM images, see #47https://github.com/tiborsimko/open-data.cern.ch/issues/47 and #48https://github.com/tiborsimko/open-data.cern.ch/issues/48.

— Reply to this email directly or view it on GitHubhttps://github.com/tiborsimko/open-data.cern.ch/issues/41#issuecomment-54060576.

tiborsimko commented 10 years ago

Closing this "meta-topical issue" that had been further individualised into separate topical issues (which were either done or for which we are tracking progress elsewhere).

cernopendata / opendata.cern.ch

Meeting August 28th #41