cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Add dataset type #567

Open MortenSikt opened 1 year ago

MortenSikt commented 1 year ago

A suggestion from the CDC upgrade report is to include information on what kind of data the study contains. Primary emphasis here is on the distinction between quantitative and qualitative data.

In reviewing the CMM, there is not a suggested element for this information, perhaps due to the focus of the model on quantitative social science data.

Review of praxis of SPs in catalogue:

AUSSDA: Quantitative: https://datacatalogue.cessda.eu/detail?lang=en&q=b7b94f779a79464ef7fe6907a8688c495f019376d15b0cf467e7c861558bbf63 Qualitative: https://datacatalogue.cessda.eu/detail?lang=en&q=06d8e3a5ce746368afcd6579b03375ea6862dc16551f760a1d86951d63a55663

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Values of field according to catalogue (seems like DDI General Data Format):

  1. Numeric
  2. Text
  3. Numeric; Text
  4. Still image
  5. Text; Numeric
  6. Still image; Numeric
  7. Interactive resource
  8. Numeric, Text
  9. Other
  10. Software
  11. Still image; Text

DANS-KNAW

Element does not seem to be in use by them in their dataverse/endpoint

DASSI Quantitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=5f58f360eef59e9672715b27553027adcb3dc2e3c12d588a9ff882fa61a82223 Qualitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=5e81dc432c1e01654a8ad9c30f3211067b2d14af46f7cdc8809e7a8a8c7c6e62

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Values of field according to catalogue:

  1. Individual data
  2. Aggregate data
  3. Other

The element can not be used to distinguish between quantitative and qualitative data, as Individual data is being used both for all kinds of individual level data

FORS Quantitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=d24c786574961a004f1fc6a25b6a64c7d34fc654ce448f518f5cb50441245d5c Qualitative data: https://datacatalogue.cessda.eu/detail?lang=fr&q=e29fcbf03909158611eaabfb4d8466174ffe3ca0e538b1c47988169b64312387

Information is unavailable in oai-pmh, but SPs landing page shows information on Data type: https://www.swissubase.ch/en/catalogue/studies/13413/latest/datasets/1004/1545/dynamicBlock/1 & https://www.swissubase.ch/en/catalogue/studies/13853/19076/datasets/2232/2573/dynamicBlock/1

Information is placed at Dataset level, not Study level. Unsure what record is harvested for CDC

FSD Quantitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=9e92dd91a4047547fa3fd53105340f75cb0f6409b6007ce608524e38dda234b3 Qualitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=462bad237835a9d770d3a56e35a9731410edf683140f2af536a6cc55bea8f891

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind Also defined in setSpec

Values of field according to catalogue

  1. Quantitative
  2. Qualitative

GESIS Unsure if they do have qualitative data. They do implement the Kind of Data element (https://search.gesis.org/?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%7B%22match_all%22%3A%7B%7D%7D%2C%22filter%22%3A%5B%7B%22term%22%3A%7B%22type%22%3A%22research_data%22%7D%7D%2C%7B%22term%22%3A%7B%22kind_data_en.keyword%22%3A%22Text%22%7D%7D%5D%7D%7D%7D&lang=en) (unavailable in OAI-PMH) and use the following values (short list of DDI General data format):

  1. Numeric
  2. Text
  3. Geospatial
  4. Other

But data that are marked with text are still considered quantitative data. Seems like the identification of this is largely tied to the kinds of variables in the dataset instead of the data itself.

PROGEDO Quantitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=41f398fce70fe5af6db1ebd1414a3a8021d4e0ecf8bf1252ae3295f28bd25393 Qualitative study: https://datacatalogue.cessda.eu/detail?lang=fr&q=74b8846bbc870ee022b8f0df4833f10c8c28abaf6973221441a0b88e34c65600

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Values of field according to catalogue (DDI General Data Format)

  1. Numeric
  2. Text
  3. Still image
  4. Geospatial
  5. Audio
  6. Video
  7. Software
  8. Interactive resource
  9. 3D
  10. Other

Studies seems to be able to have more than one value from the CV, quant example has both numeric and text tagged.

APIS Element does not seem to be in use, catalogue does not indicate and there is no reference to end point in the json

SASD Quantitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=1bf9e812d15ea729196121c1f81fef3a5107e7bac4694a51dfd88fb2b1124b55 Qualitative study: unable to locate

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Unable to identify values used from their catalogue, but quantiative study is marked with survey data in dataKind element

ADP Unable to establish connection between CDC and ADP catalogue. But nesstar instance has marked dataKindwith survey data

Not able to identify qualitative data

SoDaNet Quantitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=c63266b0a9ba9924b5a8f9b2b98e1570e90bf14d1646e04cab23d147082a4a4c Qualitative study: https://datacatalogue.cessda.eu/detail?lang=el&q=c01f5b23464aa88263c7ef7312c84df1f473caf33aa23adbfa47bb585cb346c7

Unable to identify values used from their catalogue for dataKind element, dataKind element does not seem to be fully implemented either

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

They have a different grouping of kinds of data (Data Project Category) at a higher level that is not within the ddi records with the following values:

  1. Quantitative Study without Data
  2. Qualitative Study without Data
  3. Cube
  4. Quantitative Study with Data
  5. Indices & Classifications
  6. Statistical Data
  7. Qualitative Study with Data
  8. Corpora
  9. Replication for Quantitative Analysis
  10. Mixed Study without Data
  11. Mixed Study with Data

not sure if it is possible to access this information for records being harvested, seems to be best element to use for them.

SODHA Quantitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=c27d6a2126941c7d4f5222b9a58c8434d57ad20df158f66f40eb1537c9ea5369 Qualitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=48532d2f5ed5e2fef7de4e48ed5f94ea11a0acb23115bf7da6ec85461b0bda11

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Values of field according to catalogue (does not look like a CV)

  1. Computer code
  2. Administrative records
  3. Survey Data
  4. Survey data
  5. Administration record data
  6. Anonymized record data
  7. Anonymized interview transcripts
  8. Anonymized transcripts - Uncoded
  9. Budget surveys
  10. CSV
  11. Death duty forms ....

SND Quantitative study: https://datacatalogue.cessda.eu/detail?lang=en&q=c118ceac92700c5695b5b21d775dcc70e7294a7eafb9aac6556e1a26695f09ba Qualtiative study: https://datacatalogue.cessda.eu/detail?lang=en&q=acd9a705db4dc46bb08643085a2fd73ba4554c929515e3dd9c79781cd8660d91

OAI inaccessible and SPs own export does not include listing of information in ddi export

x-path: unknown

Values of field according to catalogue (DDI General data format)

  1. Numeric
  2. Text
  3. Geospatial
  4. Still image
  5. Other
  6. 3D
  7. Audio
  8. Video
  9. Software
  10. Interactive resource

They mark data with both numeric and text, so unsure if it can be used to differentiate.

UKDS Quantitative data: https://datacatalogue.cessda.eu/detail?lang=en&q=3807eb5669b0eb5700029ca6b30b34ff9cb039c974ceafb76beb567b4ce8fe5e Qualitative data: https://datacatalogue.cessda.eu/detail?q=00118d17c3ee788faffd0be5d6c79830db8670743f1bcc69043163970741af24

x-path: codeBook/stdyDscr/stdyInfo/sumDscr/dataKind

Unsure of values of element, but seems like DDI General data format (not filterable)

Instead implement a different filter in catalogue with following values (but inaccessible in OAI/ddi record)

  1. UK Survey data
  2. Qualitative and mixed methods data
  3. Cohort and longitudinal studies
  4. Historical data
  5. Experimental data
  6. Time series data
  7. Other surveys
  8. Cross-national survey data
  9. Census data
  10. Geospatial data
  11. Business microdata
  12. Teaching data
  13. International macro data
  14. International Data Access Network
  15. Administrative data
MortenSikt commented 1 year ago

The primary element used is dataKind which is structured similarily for most SPs, to discuss if we can do something in the catalogue for SPs where this information is available

MortenSikt commented 9 months ago

Katja mentions that elements for this exist within the CMM, element 1.2.4 Data type and 1.2.5 Data format. CMM does not explicitly state that DDI General Data Format should be used.

For 1.2.4 @type for DDI-L with "Qualitative", "Quantitative" and "Mixed" is mentioned, for DDI-C dataKind is specified. Note that 1.2.5 is not available for 2.5 and 3.2

KristinaS4 commented 1 month ago

In the User Group (29.05.2024) it was decided that this field should be placed between sampling procedure and data collection mode in the Methodology section of the detailed study page. The label should be "Kind of data".

markusjt commented 1 week ago

I've created PR for indexer and searchkit but I'll leave them in draft mode for now and look at them again in August after my vacation.

Indexer PR basically add parsing and indexing:

All of these are combined to one array and then shown between sampling procedure and data collection mode in the Methodology section of the detailed study page under the label "Kind of data" in searchkit PR. They are shown similar to other multivalue fields so one value after another. Type values are shown first, then general data format term values and last data kind text values since I think it makes sense to show Quantitative/Qualitative/Mixed first.