Open KatiRG opened 1 year ago
@KatiRG This is when you are using ckanext.validation.run_on_create_sync = True
and ckanext.validation.run_on_update_sync = True
?
To my understanding, I thought that when using the validation extension in sync
mode, it runs the validation on resource creates and updates. And if that fails validation, then it will not create/update the Resource.
What version of CKAN are you running?
@JVickery-TBS We have the following settings:
ckanext.validation.run_on_create_sync = True
ckanext.validation.run_on_update_sync = True
ckanext.validation.run_on_update_async = False
ckanext.validation.formats = csv
ckanext.validation.show_badges_in_listings = True
and we are using CKAN 2.9.7. ckanext.validation.run_on_update_sync
needs to be true otherwise an uploaded file will not be validated. But xloader is pushing files that don't pass validation for some reason.
Update: it errors out because of a call to delete the new resource here:
https://github.com/frictionlessdata/ckanext-validation/blob/master/ckanext/validation/logic.py#L679
Creating a new resource with an invalid CSV file means there is no resource in the database, therefore the call to delete it (t.get_action(u'resource_delete')
) will fail.
@KatiRG sorry for the sporadic replies! I am able to recreate this locally. So am just going to debug through it a bit to see what the issue is.
But it looks like the Resource gets created for me, but then yes, I also get hit with that 404 but the web address is still /resource/new
which generally means that something is returning the Template/render method in a resource create view override instead of returning an abort or redirect.
Hopefully will have more for you soon!
Okay I found the issue:
resource_validation_run
, it calls resource_show
;resource_show
action method, it loops through the package to confirm that the resource belongs to it (https://github.com/ckan/ckan/blob/2.9/ckan/logic/action/get.py#L1097), here it fails and raises a NotFound;So I am now just debugging more here, and seeing if it would be safe to catch the NotFound exception in resource_validation_run
to fix this here. Im sure we can fix this some how in the resource_validation_run
method
Thanks @JVickery-TBS !
I found that removing the delete action is sufficient to resolve this error. The check in logic.py
already checks if the report if valid: https://github.com/KatiRG/ckanext-validation/blob/bug_94/ckanext/validation/logic.py#L677
If the resource is not valid (as checked by resource_validation_run
), then
/var/lib/ckan/default/resources
, and will get deleted by resource_validation_delete
herelocal_upload
is True and will be deleted herePlease check if this solution also works for you! I can make a PR if you like.
Screenshot of new resource create, when an invalid file has been uploaded. The resource does not get pushed to the database:
There is one issue left which is to not display the invalid resource in the dataset page:
@KatiRG The error that you original got, and that I was able to replicate was coming from Xloader. Let me know you are running with xloader plugin installed here.
So, yes, the validation failure would delete the resource, but then Xloader uses IResourceUrlChange
which is a hook that happens on before_commit
. In Xloader's notify
method, it does the resource_show
and of course at this point the resource has been deleted by Xloader. But then the raised exception happens and then Validation extension is halted and the delete resource session commit does not happen and the Resource gets created anyways.
I personally like to use IDomainObjectModification
implementation instead of IResourceUrlChange
because you have more parameters in the IDomainObjectModification
notify method. And, it automatically catches exceptions, logs them, and passes them because CKAN knows that extensions will override action methods and session commits.
I already have a PR on the xloader repo which would actually solve this problem: https://github.com/ckan/ckanext-xloader/pull/196/files
As for the functionality of the Validation extension and how synchronis validation works. It is to my understanding that Resource do NOT get created or updated if the validation fails (which happens on form submission or API POST/PATCH). Thus, the Resource would not show up on the Dataset page.
If you want to have the Resources get created even if they fail validation, then you would want to use async. This is my understanding of how the developers of the Validation extension wanted it to work.
That being said, there could be another option here to make Resources that do not pass validation become Draft instead of just deleting them.
https://github.com/ckan/ckanext-xloader/pull/198 <- PR for quicker acceptance into Xloader here.
https://github.com/qld-gov-au/ckanext-xloader/pull/69 <- and the Queensland one, if you use that fork instead.
@KatiRG ah sorry, I understand your "There is one issue left which is to not display the invalid resource in the dataset page" comment. Sorry!
Yes, that would have probably been caused from the removal of the delete. But that would be solved by keeping the delete, and fixing the issue in Xloader. (I shall go see if datapusher also has this issue)
@JVickery-TBS Keeping the delete doesn't resolve that last issue. With the delete in place, the user will be directed to a 404 page after adding the file:
but then they can navigate via the menu back to the dataset page, and the resource will be listed:
@KatiRG Yeah that is correct. That is because of the order of execution between Validation and Xloader extensions. So what happens is:
The Xloader extension needs to handle the exception of the Resource not existing during the submission to Xloader. Currently, it does not and really messes things up.
https://github.com/ckan/ckan/blob/master/ckan/model/modification.py#L67 <- there is even a comment in this model in the exception handling:
# Don't reraise other exceptions since they are generally of
# secondary importance so shouldn't disrupt the commit.
So the issue, is that Xloader is disrupting the commit here.
@KatiRG if you are using your own fork of Xloader, you can try to cherry pick from my commit PR here: https://github.com/ckan/ckanext-xloader/pull/198
Otherwise, I don't think it should take too long for them to accept this PR. I have been doing a fair amount for Xloader this past week.
Thanks @JVickery-TBS , I will review!
I think Ian Ward has had a similar issue and got around it by disabling xloader's normal loading (https://github.com/open-data/ckanext-xloader/blob/567584b1fbb82987da662e6fde8fa4054d204abf/ckanext/xloader/plugin.py#L110-L112) and then specifying in ckanext-validation to only call xloader if the validation is a success: https://github.com/open-data/ckanext-validation/blob/ee70c9b3a1606c26fd3289a9ec12c32cb5800de7/ckanext/validation/jobs.py#L108C38-L115
Yeah, we are using async for Validation, so the Resources get created no matter what so we never came across this Xloader error.
And then, yeah we disabled the default xloader submission code entirely, and then we manually call that xloader submit action in the validation extension.
I have allotted some time for myself to make this into an upstream contribution on the Xloader and Datapusher extensions. Something like ckanext.xloader.require_validation = True
and then check validation statuses before submitting to Xloader. Should be nice for everyone to have, and not very specific code in our two forks haha
Update: I have wrapped the resource_delete
in a try and this seems to have taken care of the original bug reported in this issue and the extra issue of invalid, non-existent resources being listed in the dataset page.
if new_resource:
try:
# Delete resource
t.get_action(u'resource_delete')(
{u'ignore_auth': True, 'user': None},
{u'id': resource_id}
)
except NoResultFound:
raise t.ObjectNotFound(
'This resource does not exist in the datastore')
There is a bug in this extension when creating a new resource that contains an invalid structure (e.g. duplicate column names).
After clicking on "Add", the user is directed to a non-existent page (
dataset/dataset-page-name/resource/new
) with a 404 error:What should happen
After clicking "Add", if the validation status is not "success":