Closed mdlincoln closed 8 years ago
Thanks Matthew. I'll have a look. Have you tried filtering for tool_id = 0 ?
If I filter for tool_id == 0
, the count of unique refers_to_object_id
not found in the full collection object IDs drops to 325. I take this to mean that the non-pen-created IDs may not necessarily be for objects in the collection? What are they, then?
Matthew,
tool_id == 0
are things collected by the pen from our wall labels and interactive tables. These should all have a refers_to_object_id
. All other tool_ids are applications that allows our visitors to "create" things and so they don't refer to a specific object. It looks like instead of a refers_to_object_id, we are sticking in a timestamp, which is confusing and I will look into changing that so it is just NULL instead.
Ahh that makes sense, as does the proposed change - though I guess there
are still those 325 (well, 324 if you don't count NA
/blank value) as
refers_to_object_id
in the tool_id == 0
entries, so I'm still not sure
what is happening there.
On Wed, Mar 16, 2016 at 10:26 AM, Micah Walter notifications@github.com wrote:
Matthew, tool_id == 0 are things collected by the pen from our wall labels and interactive tables. These should all have a refers_to_object_id. All other tool_ids are applications that allows our visitors to "create" things and so they don't refer to a specific object. It looks like instead of a refers_to_object_id, we are sticking in a timestamp, which is confusing and I will look into changing that so it is just NULL instead.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/cooperhewitt/the-pen-data/issues/3#issuecomment-197353983
Matthew D. Lincoln Ph.D Candidate Department of Art History & Archaeology http://arthistory.umd.edu University of Maryland College Park, MD 20742
mlincol1@umd.edu matthewlincoln.net
Ok, I've created a new branch with a cleaned up dataset. Please have a look at 4ed4ebbb268a1b168cbc66c10b511284c49840ff which sets the refers_to_object_id
to 0 for all rows where tool_id != 0
Let me know what you think...
-m
I checked the new dataset, and it's true all refers_to_object_id
are 0 for rows wehre tool_id != 0
.
However, there there are now 631 refers_to_object_id
not found in the table of objects IDs (again, this is all for tool_id == 0
:
> setdiff(filter(pen_raw, tool_id == 0)$refers_to_object_id, ch$id)
[1] 68268431 68764321 68764307 68764215 68764253 68764317 68775175 68775167 68764299 682460
[11] 68268457 35520981 68774667 68775145 68764357 68245665 68764323 68764331 68764335 68764225
[21] 68764285 68764199 68764337 68764287 68764397 68775189 68764271 68764289 68775143 68782481
[31] 68764315 68268039 68782479 68774663 68764265 68775187 18187423 68764319 68764333 68268161
[41] 68813715 18709775 35457409 35520983 68764309 51497589 68246009 68250509 18693285 68764305
[51] 35520953 1429061025 68883101 68885261 69113423 69113425 68890011 68881899 68883035 68883325
[61] 69166469 68764195 68883485 68883491 69129735 68833531 68833533 68833545 84995375 84995371
[71] 84995373 68889901 68814023 102199991 18705229 0 6911349 691113423 8006405 850064599
[81] 68743193 6268255 682682511 15460289 68745573 1875837 187558429 102391977 152749789 102391981
[91] 102391985 135706757 69167759 404577581 68268453 68743489 68743501 69154985 136252037 136252039
[101] 136252041 136252043 136252045 8500465 855006457 1870931 1874073 6268299 354474659 682245681
[111] 13625243 1362552039 1362524889 NA 6825813 688250815 850064479 6828299 354744659 1874027
[121] 6824554 6826299 354746659 69155067 102335187 404529301 69129535 6820787 668250791 35350945
[131] 13625262 69155413 69155025 69155069 69155075 69154997 69154999 69155003 69155005 69155007
[141] 69155009 69155011 69155013 69155015 69155017 69155021 69155023 69155045 69155047 69155049
[151] 69155051 69155053 69155059 69155061 69155063 69155065 69155077 69155081 69155083 69155087
[161] 69155093 69155099 69155119 69155125 69155129 69155131 69155133 69155151 69155153 69155155
[171] 69155157 69155159 69155161 69155165 69155167 69155169 69155171 69155173 69155177 69155179
[181] 69155183 69155185 69155187 69155189 69155191 69155193 69155201 69155205 69155207 69155209
[191] 69155211 69155213 69155215 69155219 69155221 69155223 69155225 69155227 69155229 69155231
[201] 69155233 69155241 69155249 69155251 69155255 69155259 69155261 69155263 69155265 69155269
[211] 69155277 69155279 69155281 69155331 69155333 69155337 69155339 69155347 69155349 69155351
[221] 69155353 69155355 69155359 69155363 69155365 69155367 69155369 69155373 69155407 69172057
[231] 69172059 69172061 69172063 69172067 69172069 69172071 69172073 69172075 69172077 69172079
[241] 69172081 69172085 69172087 69172089 69172091 69172093 69172095 69172097 69172099 69172103
[251] 69172105 69172107 69172109 69172111 69192417 69192419 69192421 69192431 69192433 69192435
[261] 69192437 69192439 69192443 69192445 69192449 69192451 69192453 69192455 69192457 69192461
[271] 69192463 69192465 69192467 69192469 69192471 69192473 69192475 69192479 69192481 69192483
[281] 69192505 69192507 69192509 69192517 69192519 69192521 69192523 69192525 69192527 69192529
[291] 69192533 69192535 69192537 69193859 69193867 69193869 69193871 69193873 69193875 69193877
[301] 69193879 69193883 69193885 69193887 69193889 69193891 69193893 69193895 69193897 69193901
[311] 69193903 69193905 69193907 69193909 69193911 69193913 69193915 69193921 69193925 69193927
[321] 69193929 69193931 69193933 102199993 102199997 102335183 102335185 102335189 102335191 135918413
[331] 135918421 135918427 135918429 135918431 135918443 135918447 136300679 404529303 404529305 404529307
[341] 404529311 404529313 404529315 404529317 404529319 404529321 404529323 404529325 404529329 404529331
[351] 404529333 404529335 404529337 404529339 404529341 404529343 404529347 404529349 404529351 404529591
[361] 404584055 404584057 404584275 404584277 404584279 404584283 404584285 404584287 404584289 404584291
[371] 404584293 404584295 404584297 404584301 404584303 404584305 404584307 404584309 404584311 404584313
[381] 404584315 404584319 404584321 404584323 404584325 404584327 420560745 420565457 420565459 420565465
[391] 420565477 420565483 420565485 420565487 420565489 420565493 420565495 420565501 420565503 420565507
[401] 420565513 69155275 69155335 69155381 69155383 69155385 69155387 69155389 69155391 69155395
[411] 69155399 69155401 69155403 69155405 69155057 69155377 69192511 69192515 69155197 69155203
[421] 69155027 69155029 69155031 69155041 420565463 69192485 69192487 69192489 69192491 69192493
[431] 69192497 69192499 404734343 404734345 152749795 1355918421 6915506 691555059 1416270910 1416270945
[441] 1425682956 1427996726 1428000917 1427728880 1427997175 1427994804 1427994173 1427994785 1427994988 1427993898
[451] 1427994896 1427995766 1427992990 1427996760 1427993319 1427994457 1427994549 1427997049 1427992978 1427993219
[461] 1427991251 1427991707 1427987696 1427992099 1427987081 1427988011 1427987524 1427991672 1427987070 1427987125
[471] 1427987256 1427987343 1427987821 1427987548 1427996676 1427987011 1427987118 1427987243 1427987292 1427991485
[481] 1427988153 1427988391 1427988287 1427994271 1427987056 1427987093 1427988192 1427987772 1427987689 1427990243
[491] 1427992084 1427991961 1427992091 1427989879 1427990066 1427988952 1427988851 1427988898 1427992489 1427992405
[501] 1427992222 1427992340 1427994790 1427995492 1427995276 1427995611 1427992824 1427998034 1427997782 1427998021
[511] 1427998334 1427998476 1427998503 1427998518 1427998566 1427998627 1427998911 1427997590 1427997729 1427997744
[521] 1427997809 1427997817 1427997827 1427997671 1427997687 1427997804 1427998052 1427998320 1427998458 1427998484
[531] 1427998492 1427998585 1427998637 1427998846 1427998890 1427998908 1427998893 1427998901 1427996320 1427996489
[541] 1427996612 1427999818 1427996314 1427995033 1427995076 1427995208 1427997992 1427998163 1427998448 1428001333
[551] 1428002213 1428002326 1428002569 1428002021 1428002110 1428002263 1428002464 1428002522 1428002382 1428006103
[561] 1428005848 1428005993 1428000737 1428001747 1428002048 1428002133 1428001886 1428001965 1428002206 1427999593
[571] 1428001267 1428004099 1428003702 1428007764 1428007855 1428007912 1428004360 1428005652 1428005941 1428005902
[581] 1428004265 1428004821 1428005576 1428007103 1428006699 1428007645 1428007815 1428008878 1428010676 1428010974
[591] 1428008888 1428010163 1428010533 1428010350 1428011448 1428075511 1428075367 1428075726 1428075297 1428075971
[601] 1428074030 1428074293 1428077365 1428077264 1428075533 1428076227 1428076387 1428076741 1428077689 1428078030
[611] 1428079021 1428078196 1428081162 1428080727 1428081762 1428080039 1428078571 1428082909 1428079037 1428081409
[621] 1428081421 1428082072 1428080293 1428080265 1428082581 1428081516 1428081312 1428081322 1428081849 1428082845
[631] 1428082989
Ah, I see what's happening. Those 631 things ( there are more now since this is newer than the original release ) are objects that are not currently set to public on our collections site. For example the first object you have listed https://collection.cooperhewitt.org/objects/68268431 should show you a "not authorized" page when you load it in a browser. If it isn't public on the website, it probably doesn't get added to the collection data on GitHub.
There are some that still come back as not found. I'm not sure what's going on there, and then there are some like https://collection.cooperhewitt.org/objects/69192497 that do work, but likely haven't been updated in the GitHub repo as of yet.
-m
Ahhhh, that makes sense! Depending on if/how you update the collection data repo to represent not-yet-authorized objects, it'd be great to have that documented in this repo's README as well - even if the answer is just "IDs not found in the objects table are just not public yet".
Good luck tracking down those other missing IDs - and thanks for checking all this out!
ok great. So I merged in the new dataset and updated the readme. Closing this now...
IIUC,
refers_to_object_id
should match up with the object IDs made available in https://github.com/cooperhewitt/collectionHowever, when trying to join the
objects.csv
created bybin/generate-csv-objects.py
, I find that the vast majority ofrefers_to_object_id
values are not found in theid
column of the collections data:Am I trying to associate with the wrong column?