cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
666 stars 148 forks source link

records: filter away some keywords values #1599

Closed tiborsimko closed 6 years ago

tiborsimko commented 7 years ago

The keyword facet looks like this:

keywords

Some values here are not really belonging to the keywords field, such as experiments.

We could filter them away during COD2 -> COD3 record migration.

We could also profit from the migration to enrich this field programmatically, if we discuss how.

CC @ArtemisLav @katilp

katilp commented 7 years ago

Are the keywords always in one generic keyword facet? I.e. if we define Drell-Yan, QCD, Vector boson, W, Z, photon, Higgs etc keywords will they appear under a generic "Filter by keyword" facet or is it possible to have them expanded only for Simulated datasets? I.e can the keywords be part of a specific group only (in this case Simulated datasets)?

tiborsimko commented 7 years ago

If we store them in the global keyword field, then they would be displayed always. Otherwise it would be impossible to know when to display them and when not, for example when someone searches for "2011" giving all the various kinds of both ATLAS records and CMS collision and simulated datasets.

If you would like to separate them, we can store them elsewhere; but I would call for simplicity and use the generic keyword field that would be always displayed...

katilp commented 7 years ago

OK, fine. Can the relevant set of keywords still be expanded after Simulated datasets (in this use case)? Or then have them with MC always, i.e "QCD MC", Drell-Yan MC" etc? Otherwise it will not be clear for usres that they are intended for MC search help.

katilp commented 7 years ago

Now as I see it, the MC selection stuff does not come out right. The topic category as it appears now is confusing, the user would not know what it is about i.e. it is not at all clear that SM Higgs or whatever relates to Simulated datasets. The list should not be seen as a list of general keywords (the primary datasets do contain SM Higgses as well and some other primary datasets are B-physics oriented, and some validation examples are relevant to SM inclusive etc, but that's not what we would intend with them). I'm not sure what to suggest, but it should be made clear that these selections only make sense when searching MC type. Maybe change the title "Filter by topic category" to "Filter by MC category" Furthermore, I would not then know how to combine with the new search keywords, easiest probably is to just have the all listed there (when in place). I'm sorry I did not come up with this in our earlier discussion, I had this fixed idea of having this selection displayed only after "Simulated datasets" in which case the "topic category" would have been OK.

tiborsimko commented 7 years ago

We have basically two options:

or:

It may be easier to talk this over live? There are various pros and cons...

katilp commented 7 years ago

OK, I would not be in favour of applying such category titles to other type of records as it would not be unambiguous, and probably result in confusion). What do you think of just changing the "Filter by" naming to "Filter by simulated dataset type"? Or then your solution two? Can the current "MC categories" become keywords?

tiborsimko commented 6 years ago

Short summary: we can change facet the name and the facet category would be displayed only when there is some MC result in the search query. This should do for initial search tests. And we agreed to remove "experiment" keywords; I'm going to make a PR for that.