Open teodorescuserban opened 10 years ago
As I understand, Luis/Godfrey are working on these? Is there a data team issue, if so can we close this one?
@luiscape please comment on it.
I would leave it open, unless there is a specific repo for data team issues. :)
@teodorescuserban Interesting. Would you know what is the process of registering resources to CKAN? Is it done at the CPS-level or is there another script running on its own somewhere else?
There were 2 ways to input data into prod ckan so far.
As far as I know there is not yet a way programmed to publish from cps resources on ckan.
That is my understanding as well.
Talked to luis about this. We did a CKAN api search on stag and prod and did not find any datasets whose url contain "hdx-"
Just complementing Godfrey's response and closing.
We searched using http://data.hdx.rwlabs.org/api/action/resource_search?query=url:hdx-1.0.0. The output is:
result: {
count: 0,
results: [ ]
}
When we search with http://data.hdx.rwlabs.org/api/action/resource_search?query=url:hdx the output is:
result: {
count: 2713,
results: [ ]
}
Even searching with cntr + F
on the latter query no hdx-1.0.0 was found.
Closing.
@luiscape You reopened this. What's the latest?
Ops. I don't remember re-opening. Sleepwalking, I assume.
Closing.
Still not solved, but I guess Iwill just get rid of some of trash with @cjhendrix assistnce and approval. :)
Serban, please paste the list here. I want to be doubly sure they aren't visible anywhere in CKAN. Then we can try to figure out why they are still in the db (deleted/private items, I'm guessing), and talk about deleting.
--cj
@teodorescuserban Is this still an issue?
Sorry, it looks like this one rscpaed my eye somehow. Will check tomorrow and reply, @cjhendrix
I did a bit more investigation on this. What I want to avoid is deleting anything from the database that may still be part of a dataset, even if the dataset is deleted or private or if it's part of an old revision.
Some of these garbage URLs are definitely used in deleted datasets:
However, in Serban's query result, this url is listed as active (not deleted, like some of them).
Serban, I do think I need a better table like you described in order to better troubleshoot this. Could you query out:
Dataset
package id
package_name
private
state
revision id
Resource
resource name
resource
revision id
These things have been in there for a while and don't seem to be causing problems, so there is no rush on this.
Please move it to next sprint on monday if i cant make it.
Still no time for that one...
query used:
select r.id as r_id, r.name as r_name, r.url as r_url, r.state as r_state, r.revision_id as r_rev_id, p.id as p_id, p.name as p_name, p.url as p_url, p.state as p_state, p.private as p_private, p.revision_id as p_rev_id from resource as r, resource_group as g, package as p where r.url like '%hdx-1.0.0%' and r.resource_group_id = g.id and g.package_id = p.id;
results will come by email in a few minutes.
@teodorescuserban any update about this?
I've got the email. Reassigning to me.
@cjhendrix is this a valid issue anymore?
I suspect the issue still exists, but whether or not it is important is for @teodorescuserban to say.
@teodorescuserban please take a look and comment
Working on caching, i listed on stag the cps resources published on ckan and I noticed there are quite a few ckan resources pointing to the old cps urls (ones having hdx-1.0.0 instead of hdx).
Please pm on skype to get more details.