RDFLib / pymicrodata

This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by the W3C Semantic Web Interest Group task force, in March 2012. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph Object.
http://www.w3.org/2012/pyMicrodata/
Other
44 stars 10 forks source link

Turn off the rdf List generation? #3

Open danbri opened 10 years ago

danbri commented 10 years ago

Is there a way to turn off the list generation?

e.g. the example from http://schema.org/TVSeason

... is there a way to avoid all the first/rest stuff?

:Nc1fdd703612347a68ecb956d14e81542 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :Nb3c884b3ffea425e976818d348c61a6c . :N606548c887634be18ea22276a125b2dd http://schema.org/name "Jessica Capshaw" . :N84fa0e455314402bbe92e9c24e2fff1e http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :Nb3c884b3ffea425e976818d348c61a6c http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :N84fa0e455314402bbe92e9c24e2fff1e http://schema.org/name "Shonda Rimes" . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/datePublished "2006-05-14" . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/numberOfEpisodes "14" . :N606548c887634be18ea22276a125b2dd http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/name "Greys Anatomy" . :N51a6d0e636f34cc1be79c8dea67e046f http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :Nfbdefb1d62944d24a4259265ca8fef3e http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVEpisode . :N666ff0980cb14865beb1312ad601600b http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/name "Season 1" . :Nc1fdd703612347a68ecb956d14e81542 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :N509763638b694c4191fa4b7b2665c631 . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/actor :N2d3e495c0b524e2bbda79dd00c19db8b . :N509763638b694c4191fa4b7b2665c631 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/season :Nc1fdd703612347a68ecb956d14e81542 . :N32d2c32dcbb9444da07bcae1b1179073 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N32d2c32dcbb9444da07bcae1b1179073 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :Nfbdefb1d62944d24a4259265ca8fef3e . :N07a7522d6b274b01ad178b2c660c3dc4 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N606548c887634be18ea22276a125b2dd . :N51a6d0e636f34cc1be79c8dea67e046f http://schema.org/name "Justin Chambers" . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/author :N84fa0e455314402bbe92e9c24e2fff1e . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/numberOfEpisodes "27" . :N383b433e7e7d4f33af5558a8a0332e87 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :Nfbdefb1d62944d24a4259265ca8fef3e http://schema.org/name "Episode 1" . :N666ff0980cb14865beb1312ad601600b http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N1bda1e22f4974a6c9b0a8a14c5d827e1 . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/name "Season 2" . file:tv1.html http://www.w3.org/ns/rdfa#usesVocabulary http://schema.org/ . :N509763638b694c4191fa4b7b2665c631 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N383b433e7e7d4f33af5558a8a0332e87 . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/episode :N32d2c32dcbb9444da07bcae1b1179073 . file:tv1.html http://www.w3.org/ns/md#item :N666ff0980cb14865beb1312ad601600b . :N2d3e495c0b524e2bbda79dd00c19db8b http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :N07a7522d6b274b01ad178b2c660c3dc4 . :Nfbdefb1d62944d24a4259265ca8fef3e http://schema.org/episodeNumber "1" . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/datePublished "2005-05-22" . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeries . :N07a7522d6b274b01ad178b2c660c3dc4 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N2d3e495c0b524e2bbda79dd00c19db8b http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N51a6d0e636f34cc1be79c8dea67e046f .

iherman commented 10 years ago

Well... the generation 'simply' implements what is in the transformation note:

http://www.w3.org/TR/microdata-rdf/

There is value for controlling this:

http://www.w3.org/TR/microdata-rdf/#value-ordering

The registry for schema.org looks as follows:

"http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "blogPosts": {"multipleValues": "list"}, "breadcrumb": {"multipleValues": "list"}, "byArtist": {"multipleValues": "list"}, "creator": {"multipleValues": "list"}, "episode": {"multipleValues": "list"}, "episodes": {"multipleValues": "list"}, "event": {"multipleValues": "list"}, "events": {"multipleValues": "list"}, "founder": {"multipleValues": "list"}, "founders": {"multipleValues": "list"}, "itemListElement": {"multipleValues": "list"}, "musicGroupMember": {"multipleValues": "list"}, "performerIn": {"multipleValues": "list"}, "actor": {"multipleValues": "list"}, "actors": {"multipleValues": "list"}, "performer": {"multipleValues": "list"}, "performers": {"multipleValues": "list"}, "producer": {"multipleValues": "list"}, "recipeInstructions": {"multipleValues": "list"}, "season": {"multipleValues": "list"}, "seasons": {"multipleValues": "list"}, "subEvent": {"multipleValues": "list"}, "subEvents": {"multipleValues": "list"}, "track": {"multipleValues": "list"}, "tracks": {"multipleValues": "list"} } },

This can of course be changed/updated, but you tell us what and how...

That being said, isn't it correct that if this is serialized in Turtle, it would generate the (...) syntax? Which looks o.k....

Ivan

danbri wrote:

Is there a way to turn off the list generation?

e.g. the example from http://schema.org/TVSeason

... is there a way to avoid all the first/rest stuff?

:Nc1fdd703612347a68ecb956d14e81542 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :Nb3c884b3ffea425e976818d348c61a6c . :N606548c887634be18ea22276a125b2dd http://schema.org/name "Jessica Capshaw" . :N84fa0e455314402bbe92e9c24e2fff1e http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :Nb3c884b3ffea425e976818d348c61a6c http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :N84fa0e455314402bbe92e9c24e2fff1e http://schema.org/name "Shonda Rimes" . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/datePublished "2006-05-14" . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/numberOfEpisodes "14" . :N606548c887634be18ea22276a125b2dd http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/name "Greys Anatomy" . :N51a6d0e636f34cc1be79c8dea67e046f http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :Nfbdefb1d62944d24a4259265ca8fef3e http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVEpisode . :N666ff0980cb14865beb1312ad601600b http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/name "Season 1" . :Nc1fdd703612347a68ecb956d14e81542 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :N509763638b694c4191fa4b7b2665c631 . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/actor :N2d3e495c0b524e2bbda79dd00c19db8b . :N509763638b694c4191fa4b7b2665c631 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/season :Nc1fdd703612347a68ecb956d14e81542 . :N32d2c32dcbb9444da07bcae1b1179073 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N32d2c32dcbb9444da07bcae1b1179073 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :Nfbdefb1d62944d24a4259265ca8fef3e . :N07a7522d6b274b01ad178b2c660c3dc4 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N606548c887634be18ea22276a125b2dd . :N51a6d0e636f34cc1be79c8dea67e046f http://schema.org/name "Justin Chambers" . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://schema.org/author :N84fa0e455314402bbe92e9c24e2fff1e . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/numberOfEpisodes "27" . :N383b433e7e7d4f33af5558a8a0332e87 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :Nfbdefb1d62944d24a4259265ca8fef3e http://schema.org/name "Episode 1" . :N666ff0980cb14865beb1312ad601600b http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N1bda1e22f4974a6c9b0a8a14c5d827e1 . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/name "Season 2" . file:tv1.html http://www.w3.org/ns/rdfa#usesVocabulary http://schema.org/ . :N509763638b694c4191fa4b7b2665c631 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N383b433e7e7d4f33af5558a8a0332e87 . :N383b433e7e7d4f33af5558a8a0332e87 http://schema.org/episode :N32d2c32dcbb9444da07bcae1b1179073 . file:tv1.html http://www.w3.org/ns/md#item :N666ff0980cb14865beb1312ad601600b . :N2d3e495c0b524e2bbda79dd00c19db8b http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :N07a7522d6b274b01ad178b2c660c3dc4 . :Nfbdefb1d62944d24a4259265ca8fef3e http://schema.org/episodeNumber "1" . :Nb3c884b3ffea425e976818d348c61a6c http://schema.org/datePublished "2005-05-22" . :N1bda1e22f4974a6c9b0a8a14c5d827e1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeries . :N07a7522d6b274b01ad178b2c660c3dc4 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . :N2d3e495c0b524e2bbda79dd00c19db8b http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N51a6d0e636f34cc1be79c8dea67e046f .

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/pymicrodata/issues/3.

Ivan Herman Bankrashof 108 1183NW Amstelveen The Netherlands tel: +31-64-1044153 http://www.ivan-herman.net

danbri commented 10 years ago

I tried cutting this down, in my checked out version,

_registry = """ { "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "multipleValues": "unordered" } }, "http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } } }

...but I'm still getting some firsts and rests:

danbri-macbookpro:ogp danbri$ ./sdo2ogp.py -n tv1.html :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/episode :Na2e279ea6eea40bcb52f117d6300397a . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season :N2f65cbc9dd0e42ef8483d5097b82f65a . :Nbdbaceff91db4cfd82f78e9a90b6e669 http://schema.org/name "Shonda Rimes" . :N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil .

http://www.w3.org/ns/rdfa#usesVocabulary http://schema.org/ . _:Na2922616629d4f8481eb10061f23e284 http://schema.org/name "Justin Chambers" . _:N2e27c6cd66924f3ba0b7a660380a8ed9 http://schema.org/name "Jessica Capshaw" . _:Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/datePublished "2005-05-22" . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor _:Na2922616629d4f8481eb10061f23e284 . _:N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/datePublished "2006-05-14" . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season _:Nea03bdda291f4e709dab4fdec44c1ce8 . _:Na2e279ea6eea40bcb52f117d6300397a http://schema.org/name "Episode 1" . _:Nea03bdda291f4e709dab4fdec44c1ce8 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeries . _:Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/name "Season 1" . _:Na2922616629d4f8481eb10061f23e284 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . _:N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/numberOfEpisodes "27" . _:N2f65cbc9dd0e42ef8483d5097b82f65a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . _:Na2e279ea6eea40bcb52f117d6300397a http://schema.org/episodeNumber "1" . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/author _:Nbdbaceff91db4cfd82f78e9a90b6e669 . http://www.w3.org/ns/md#item _:N52ee706f444546b0a3adc175df45a196 . _:Na2e279ea6eea40bcb52f117d6300397a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVEpisode . _:N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/name "Season 2" . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor _:N2e27c6cd66924f3ba0b7a660380a8ed9 . _:Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/numberOfEpisodes "14" . _:Nbdbaceff91db4cfd82f78e9a90b6e669 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/name "Greys Anatomy" . _:N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#first _:N1ce362ef698f4d40a6e3a2cc2fa4e3d5 . _:N2e27c6cd66924f3ba0b7a660380a8ed9 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person .
iherman commented 10 years ago

Ack. I will try to find some time to look at it; there may be a bug...

Ivan

danbri wrote:

I tried cutting this down, in my checked out version,

_registry = """ { "http://schema.org/": http://schema.org/%22: { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" http://www.w3.org/1999/02/22-rdf-syntax-ns#type%22}, "multipleValues": "unordered" } }, "http://microformats.org/profile/hcard": http://microformats.org/profile/hcard%22: { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": http://microformats.org/profile/hcalendar#%22: { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } } }

...but I'm still getting some firsts and rests:

danbri-macbookpro:ogp danbri$ ./sdo2ogp.py -n tv1.html :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/episode :Na2e279ea6eea40bcb52f117d6300397a . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season :N2f65cbc9dd0e42ef8483d5097b82f65a . :Nbdbaceff91db4cfd82f78e9a90b6e669 http://schema.org/name "Shonda Rimes" . :N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . http://www.w3.org/ns/rdfa#usesVocabulary http://schema.org/ . :Na2922616629d4f8481eb10061f23e284 http://schema.org/name "Justin Chambers" . :N2e27c6cd66924f3ba0b7a660380a8ed9 http://schema.org/name "Jessica Capshaw" . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/datePublished "2005-05-22" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor :Na2922616629d4f8481eb10061f23e284 . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/datePublished "2006-05-14" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season :Nea03bdda291f4e709dab4fdec44c1ce8 . :Na2e279ea6eea40bcb52f117d6300397a http://schema.org/name "Episode 1" . :Nea03bdda291f4e709dab4fdec44c1ce8 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeries . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/name "Season 1" . :Na2922616629d4f8481eb10061f23e284 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/numberOfEpisodes "27" . :N2f65cbc9dd0e42ef8483d5097b82f65a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :Na2e279ea6eea40bcb52f117d6300397a http://schema.org/episodeNumber "1" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/author :Nbdbaceff91db4cfd82f78e9a90b6e669 . http://www.w3.org/ns/md#item :N52ee706f444546b0a3adc175df45a196 . :Na2e279ea6eea40bcb52f117d6300397a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVEpisode . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/name "Season 2" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor :N2e27c6cd66924f3ba0b7a660380a8ed9 . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/numberOfEpisodes "14" . :Nbdbaceff91db4cfd82f78e9a90b6e669 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/name "Greys Anatomy" . :N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 . _:N2e27c6cd66924f3ba0b7a660380a8ed9 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person .

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/pymicrodata/issues/3#issuecomment-35310396.

Ivan Herman Bankrashof 108 1183NW Amstelveen The Netherlands tel: +31-64-1044153 http://www.ivan-herman.net

iherman commented 10 years ago

Well,

I looked at this and, unfortunately, this is correct (again, the structure is much more visible in turtle). What happens is that the way the microdata->RDF conversion is defined is to have a top level triple of the sort

<> md:item ( item1 item2 item2 )

This is because the microdata specification is adamant on keeping the order of things as they appear in the HTML source. This is what you see in this case, although the list consists of one single entry with the reduced registry...

:-(

Gregg, I wonder whether we should not issue a new version of the note with an updated schema.org structure...

Ivan

danbri wrote:

I tried cutting this down, in my checked out version,

_registry = """ { "http://schema.org/": http://schema.org/%22: { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" http://www.w3.org/1999/02/22-rdf-syntax-ns#type%22}, "multipleValues": "unordered" } }, "http://microformats.org/profile/hcard": http://microformats.org/profile/hcard%22: { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": http://microformats.org/profile/hcalendar#%22: { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } } }

...but I'm still getting some firsts and rests:

danbri-macbookpro:ogp danbri$ ./sdo2ogp.py -n tv1.html :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/episode :Na2e279ea6eea40bcb52f117d6300397a . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season :N2f65cbc9dd0e42ef8483d5097b82f65a . :Nbdbaceff91db4cfd82f78e9a90b6e669 http://schema.org/name "Shonda Rimes" . :N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil . http://www.w3.org/ns/rdfa#usesVocabulary http://schema.org/ . :Na2922616629d4f8481eb10061f23e284 http://schema.org/name "Justin Chambers" . :N2e27c6cd66924f3ba0b7a660380a8ed9 http://schema.org/name "Jessica Capshaw" . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/datePublished "2005-05-22" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor :Na2922616629d4f8481eb10061f23e284 . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/datePublished "2006-05-14" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/season :Nea03bdda291f4e709dab4fdec44c1ce8 . :Na2e279ea6eea40bcb52f117d6300397a http://schema.org/name "Episode 1" . :Nea03bdda291f4e709dab4fdec44c1ce8 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeries . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/name "Season 1" . :Na2922616629d4f8481eb10061f23e284 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/numberOfEpisodes "27" . :N2f65cbc9dd0e42ef8483d5097b82f65a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVSeason . :Na2e279ea6eea40bcb52f117d6300397a http://schema.org/episodeNumber "1" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/author :Nbdbaceff91db4cfd82f78e9a90b6e669 . http://www.w3.org/ns/md#item :N52ee706f444546b0a3adc175df45a196 . :Na2e279ea6eea40bcb52f117d6300397a http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/TVEpisode . :N2f65cbc9dd0e42ef8483d5097b82f65a http://schema.org/name "Season 2" . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/actor :N2e27c6cd66924f3ba0b7a660380a8ed9 . :Nea03bdda291f4e709dab4fdec44c1ce8 http://schema.org/numberOfEpisodes "14" . :Nbdbaceff91db4cfd82f78e9a90b6e669 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person . :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 http://schema.org/name "Greys Anatomy" . :N52ee706f444546b0a3adc175df45a196 http://www.w3.org/1999/02/22-rdf-syntax-ns#first :N1ce362ef698f4d40a6e3a2cc2fa4e3d5 . _:N2e27c6cd66924f3ba0b7a660380a8ed9 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/Person .

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/pymicrodata/issues/3#issuecomment-35310396.

danbri commented 10 years ago

Can you suggest a hack that will make this parser emit non-List-based triples from schema.org microdata?

iherman commented 10 years ago

Well... honestly, I do not see anything clean. Of course, it can be removed from the code, but I would not like to do so; the implementation simply implements what the microdata model is in RDF. In other words, the only clean way would be to go 'back' to the source, change either the microdata model itself or its RDF mapping, ie, declaring the order to be irrelevant. But I do not think the implementation should deviate from the spec.

Speaking as an individual, and knowing the usage of microdata, I think the fact that the microdata specification insists on keeping the top level order is indeed irrelevant, but that is only my opinion. On the other hand, the fact that some of the terms, say, in schema.org keep the order makes a lot of sense; a good example is 'author'.

Ivan

Gunnar wrote:

Can you suggest a hack that will make this parser emit non-List-based triples from schema.org microdata?

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/pymicrodata/issues/3#issuecomment-35548362.

danbri commented 10 years ago

On 20 February 2014 07:49, Ivan Herman notifications@github.com wrote:

Well... honestly, I do not see anything clean. Of course, it can be removed from the code, but I would not like to do so; the implementation simply implements what the microdata model is in RDF. In other words, the only clean way would be to go 'back' to the source, change either the microdata model itself or its RDF mapping, ie, declaring the order to be irrelevant. But I do not think the implementation should deviate from the spec.

Speaking as an individual, and knowing the usage of microdata, I think the fact that the microdata specification insists on keeping the top level order is indeed irrelevant, but that is only my opinion. On the other hand, the fact that some of the terms, say, in schema.org keep the order makes a lot of sense; a good example is 'author'.

I'm not aware of anyone relying on document order for schema.org semantics. Other Microdata users might vary (but I'm not aware of any of those either).

That said, it is perfectly proper for a doc format like HTML+Microdata to say that order matters, and that in principle they could be relied on. In practice it adds so much complexity to the graph, that the simplified unordered view is super useful. And extracting such a view is entirely reasonable; it seems a shame if this tool makes it hard.

dbs commented 8 years ago

Per https://github.com/w3c/microdata-rdf/issues/6, the second edition of the Microtdata to RDF transform specification (https://www.w3.org/TR/microdata-rdf/) suggests that the top-level md:item should be dropped. It would be great to see the pyMicrodata parser plugin for rdflib follow that recommendation; right now I'm crawling a bunch of html and extracting RDFa, JSON-LD, and microdata into a triple store, and the md:item misdirection is a pain.

iherman commented 8 years ago

@dbs, I have made the changes and issues a PR for this:

https://github.com/RDFLib/rdflib/pull/443

however, this PR has never been incorporated. It seems that there were problems with the procedures to accept the PR; to be honest, I do not remember all the details. The automatic system, I believe, reported some errors that, in my view, aren't errors and we got stuck. Very honestly, I have not looked at this thing since then (more than a year) and the facts are hazy. If somebody can de-block this situation, that would be good but, really, I cannot do it these days (I pretty much moved away from RDFLib...)

/Cc: @joernhees

joernhees commented 8 years ago

heh, that's still on my todo list as well and then life happened... :-/

let's keep the discussion in https://github.com/RDFLib/rdflib/pull/443 and close this when https://github.com/RDFLib/rdflib/pull/443 is resolved

dbs commented 8 years ago

Thanks for the update, @iherman -- I'll try to put some more effort into resolving RDFLib/rdflib#443 as well!

iherman commented 8 years ago

On 25 Jan 2016, at 15:20, Dan Scott notifications@github.com wrote:

Thanks for the update, @iherman https://github.com/iherman -- I'll try to put some more effort into resolving RDFLib/rdflib#443 https://github.com/RDFLib/rdflib/pull/443 as well!

I appreciate that! Thanks.

Ivan

joernhees commented 8 years ago

https://github.com/RDFLib/rdflib/pull/587