Open nigelcharman opened 7 months ago
xssInputRule specifies the response when code is detected. The options are rejectInvalid or sanitizeInvalid. The rejectInvalid value is the default and is recommended.
Other potential fixes are:
This assumes that the iNaturalist description is the only one that we use that can contain HTML. This post states that Markdown is supported on comments, identifications, journal posts, and mostly on user profiles and project descriptions. Presumably HTML is the same?
@amazing-will I'm thinking you could reproduce by copying the NotesAndDetails value:
'NotesAndDetails': '<em>Locality: NEW ZEALAND AK, suburb of Glen Innes, Paddington Reserve (W Tamaki Rd entrance).\r\n\r\n<em>Habitat: One large plant, 3-4 m high. The plant is visible on <a href="https://www.google.co.nz/maps/@-36.870013,174.8628706,3a,21.3y,172.22h,83.33t/data=!3m6!1e1!3m4!1sWNeFx4TNIV6kKLgalvaGuA!2e0!7i13312!8i6656!6m1!1e1">Street View - Nov 2015</a> (above the green transformer). I have uploaded a screen shot, but note that the screen shot shows the plant in Nov 2015.\r\n\r\n<em>Identification: </em><a href="http://naturewatch.org.nz/listed_taxa/5251492">Solanum mauritianum</a><em> Scop., 1788.'
For end-to-end testing, you can modify the config/sync_configuration.json
file to pull through all Woolly Nightshade observations. Change:
"Woolly nightshade - Kaipatiki": {
"file_prefix": "woolly_nightshade_kaipatiki",
"taxon_ids": ["133287"],
"place_ids": ["123353"]
},
to:
"Woolly nightshade - NZ": {
"file_prefix": "woolly_nightshade_nz",
"taxon_ids": ["133287"],
"place_ids": ["6803"]
},
Reproduced in a feature test using the whole 'Notesanddetails' field as above.
It it seems to be the map reference:
<a href="https://www.google.co.nz/maps/@-36.870013,174.8628706,3a,21.3y,172.22h,83.33t/data=!3m6!1e1!3m4!1sWNeFx4TNIV6kKLgalvaGuA!2e0!7i13312!8i6656!6m1!1e1">Street View - Nov 2015</a>
that causes the issue. If I remove the end of the url from /maps/ onwards it synchs fine.
it suggests a fix would be to check for any map references and remove them.
That seems a bit specific? I wonder if it might fail on other hrefs or other HTML content? Are you able to try some other HTML strings?
We could possibly replace the invalid HTML content with a message like INVALID CONTENT DETECTED - see iNaturalist link for full notes
Looking into this a bit more, it's the "=" symbol in the url. href="https://www.otherplace.co.nz/@-aw,wor.d/data,1!23" is okay... but... href="https://www.otherplace.co.nz/@-aw,wor.d/data=,1!23" fails
on that basis we should search any url for an = and put out an INVALID CONTENT message.
Wow. That's kind of like an edge case of an edge case :) Yep, please implement that!
Question: are there any other HTML fields we should perhaps check?
Question: are there any other HTML fields we should perhaps check?
I think it's unlikely. My understanding is that it's only supported on comments, descriptions and posts. https://forum.inaturalist.org/t/useful-html-tags-for-inaturalist-comments-and-other-text-wiki/6198/43
<div attrib='word'> </div>
will also cause an invalid HTML error in the CAMS write. But <div> </div>
is okay. I've added a sanitiseHTML method so we can add anything else as we find it. I just hope it doesn't just keep going and going? We could instead remove html.
https://github.com/EcoNet-NZ/inaturalist-to-cams/actions/runs/7937935337 fails with the error "Field NotesAndDetails has invalid html content." when synchronising https://www.inaturalist.org/observations/2763972.
NotesAndDetails is shown as:
This needs fixing to allow weeds to be synchronised nationally. For now I have reverted to just pull in Woolly Nightshade observations from Kaipatiki.