Open davanstrien opened 2 years ago
Happy to help anyone who wants to work on this. I have a WIP loading script for another COCO formatted dataset: https://huggingface.co/datasets/biglam/nls_chapbook_illustrations
Also, I really want to call this dataset smelly_objects
...
I'd love to work on this! Will be a good change from the text datasets so far.
Awesome, and don't worry if you can't finish this before you go away. It can wait until you're back too 🙂
Hopefully, I should be able to get it done. From the Zenodo page:
Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script.
Should the dataset just contain the links to the images then?
Hopefully, I should be able to get it done. From the Zenodo page:
Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script.
Should the dataset just contain the links to the images then?
Yes I think that would be best for this one. We can provide example code for downloading the images in the datacard.
@davanstrien This dataset has a lot of associated metadata
['File Name', 'Artist', 'Title', 'Query', 'Part', 'Earliest Date',
'Latest Date', 'Margin Years', 'Genre', 'Material', 'Medium',
'Height of Image Field', 'Width of Image Field', 'Type of Object',
'Height of Object', 'Width of Object', 'Diameter of Object',
'Position of Depiction on Object', 'Current Location',
'Repository Number', 'Original Location', 'Original Place',
'Original Position', 'Context', 'Place of Discovery',
'Place of Manufacture', 'Associated Scenes', 'Object Categories',
'Related Works of Art', 'Type of Similarity', 'Inscription',
'Text Source', 'Bibliography', 'Photo Archive', 'Image URL',
'Details URL', 'Additional Information']
Should they all be included in the dataset? Most of them are missing, from a cursory glance at the data. Current Location
, Earliest Date
, Latest Date
, Genre
, Material
and Medium
are populated for most of the images. I was thinking some of the fields like Material
and Medium
could be used for classification, maybe
@davanstrien This dataset has a lot of associated metadata
['File Name', 'Artist', 'Title', 'Query', 'Part', 'Earliest Date', 'Latest Date', 'Margin Years', 'Genre', 'Material', 'Medium', 'Height of Image Field', 'Width of Image Field', 'Type of Object', 'Height of Object', 'Width of Object', 'Diameter of Object', 'Position of Depiction on Object', 'Current Location', 'Repository Number', 'Original Location', 'Original Place', 'Original Position', 'Context', 'Place of Discovery', 'Place of Manufacture', 'Associated Scenes', 'Object Categories', 'Related Works of Art', 'Type of Similarity', 'Inscription', 'Text Source', 'Bibliography', 'Photo Archive', 'Image URL', 'Details URL', 'Additional Information']
Should they all be included in the dataset? Most of them are missing, from a cursory glance at the data.
Current Location
,Earliest Date
,Latest Date
,Genre
,Material
andMedium
are populated for most of the images. I was thinking some of the fields likeMaterial
andMedium
could be used for classification, maybe
My own feeling would be to include as much as possible. One option if things are often missing would be to put some of this metadata in an additional metadata column as a dictionary? This way it doesn't get lost but also is slightly less distracting than having a lot of columns with mostly missing data?
Yeah, I was building out the features as follows:
features = datasets.Features(
{
"id": datasets.Value("string"),
"url": datasets.Value("string"),
"annotations": datasets.Value("string"),
"date": datasets.Value("string"),
"genre": datasets.Value("string"),
"material": datasets.Value("string"),
"metadata": {
"artist": datasets.Value("string"),
"query": datasets.Value("string"),
"title": datasets.Value("string"),
"height": datasets.Value("string"),
"width": datasets.Value("string"),
}
}
)
I'll probably get back to this in about two weeks, after I come back from vacation
I'll probably get back to this in about two weeks, after I come back from vacation
Have a great vacation!
@davanstrien I'm back to working on this dataset, but it seems like the URLs aren't accessible. Even the download script provided in the dataset gives the following error:
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Example from the first image in the metadata document:
URL: http://www.sigecweb.beniculturali.it/images/fullsize/ICCD50007114/ICCD4644613_SBAS%20RM%20223305.jpg
@shamikbose hey, hope you had a good break!
I'll try and take a look at this too but also tagging @kiymetakdemir who works on this project and might be able to help with this.
@davanstrien I did! It was a much needed break Thanks for adding @kiymetakdemir. Hoping this data can still be accessed
Hi @shamikbose, can you check it again? Now I tried to download the images with the given script but I haven't encountered any error, it downloaded successfully.
@kiymetakdemir I was able to download them today. Thanks!
@kiymetakdemir I get an error for this URL (http://134.76.24.240/download/07876601/flc0596164z_p?Expires=1610722060&Signature=SX15SE0B~KbZ7yvkTJtis1rsKysZddvhsxJzZSZ7oZoxqd~NNsKp22iYZGBQViGXMy7zwTDCYxu-Qan2O0aq2QxizENey~CF4WIV5-~bHwEZZjrmCoBdWDEeS0Y6XNajZ6DYzWQolxkiGWoqLs~Bw0j4GSrQef7QvgQciIWDlTE_&Key-Pair-Id=APKAJGHHKKX2FHRP63AQ) It's not accessible Update: The links from www.sigecweb.beniculturali.it are timing out again
A URL for this dataset
https://doi.org/10.5281/zenodo.6367776
Dataset description
From the Zenodo page:
Object detection datasets are time consuming to collect and there are relativlely few datasets for object detection that use LAM data. Those that do exist often use the output of one of the various YOLO models which may be of some interest but often includes categories which are unlikely to be particularly useful for research/curation of LAM collections. This dataset, in contrast, includes categories related to smell: a topic of interest to both art historians and social historians. As a result, this dataset offers a much richer exploration of the possibilities of using object detection with historical paintings.
Dataset modality
Image
Dataset licence
Creative Commons Attribution 4.0 International
Other licence
No response
How can you access this data
Other
Confirm the dataset has an open licence
Contact details for data custodian
No response