IATI / ckanext-iati

CKAN extension for the IATI Registry
http://iatiregistry.org
9 stars 6 forks source link

Admin UI: Publisher Search #417

Open robredpath opened 12 months ago

robredpath commented 12 months ago

As a member of IATI Support, I want to find publishers using the information I have available† so that I can quickly discover the Registry situation for someone that I'm helping.

† Organisation name (e.g. "Open Data Services"), publishing status, presence of errors in data, org-id, country

In conversation with @cormachallinanderilinx we refined this to:

Acceptance criteria

††† I don't think that we should have both the table headings and the Order By: dropdown for sorting results. We should choose one. My preference would be for the table headings to sort the whole list.

  1. Non-functional Criteria (Include availability, maintainability, performance, reliability, scalability, security, and usability criteria)

This search interface can be available to all users, apart from the ability to see unapproved publishers which should be restricted to logged-in Sysadmin users only.

EDIT: Update 2024-01-04 in line with discussions below

siwhitehouse commented 11 months ago

Thanks for raising this issue @robredpath. The main motivation for having this functionality is so that we can search across the whole set of publishers in one place, something that we don't have at the moment.

aiui the use of checkboxes as suggested makes it possible for a user to check "has published" without checking "is approved". I don't think a UI should allow this and it suggests to me that checkboxes are unsuitable here. Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

In the publishers list,as it is currently implemented, the number of published datasets is shown. I think this is useful and would like to see it retained here, please.

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

What is the logic for excluding unapproved publishers from non-Sysadmin users, please? At the moment sometimes people try to register the same organisation more than once. Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

robredpath commented 11 months ago

Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

I don't have strong feelings on this. I agree that the UI allowing the user to select (hopefully!) impossible combinations isn't ideal, but having a single control that selects based on a combination of fields doesn't feel great either. We could make some JavaScript to auto-select "is approved" if they select "is published"?

In the publishers list,as it is currently implemented, the number of published datasets is shown. I think this is useful and would like to see it retained here, please.

Agreed. Implicit - but should be explicit - is that there's no loss of display or other functionality as a result of this change.

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results, and it stops someone having to click through a long list of publishers one-by-one to see if they have errors in their data. I would expect the control to be a Boolean, and the results to either be Boolean or numeric, depending on implementation considerations

What is the logic for excluding unapproved publishers from non-Sysadmin users, please?

This is a security consideration: if someone creates a publisher for some non-IATI-related purpose (such as to advertise their gambling website, or some illegal pursuit) then we don't want any content that they create, even whatever they entered into the publisher name field, to be displayed on the website until it's been reviewed.

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

Perhaps a mitigation to this might be to indicate if the search term appears in unapproved publishers, without actually listing them? Some carefully-crafted help text would be required to explain what was going on, however.

siwhitehouse commented 11 months ago

Radio buttons with "all", "registered (unapproved)", "approved (unpublished)" and "published" or similar might be better.

I don't have strong feelings on this. I agree that the UI allowing the user to select (hopefully!) impossible combinations isn't ideal, but having a single control that selects based on a combination of fields doesn't feel great either. We could make some JavaScript to auto-select "is approved" if they select "is published"?

I think what I am suggesting is a single control that allows someone to either search across all categories of publisher, or just one category. Let's move on and see what @cormachallinanderilinx suggests when implementing this.

snip

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results, and it stops someone having to click through a long list of publishers one-by-one to see if they have errors in their data. I would expect the control to be a Boolean, and the results to either be Boolean or numeric, depending on implementation considerations

I'm still unclear about the use case for a member of IATI support to be using this. Should we be considering how this particular filter interacts, or doesn't, with http://dashboard.iatistandard.org/data_quality.html ?

What is the logic for excluding unapproved publishers from non-Sysadmin users, please?

This is a security consideration: if someone creates a publisher for some non-IATI-related purpose (such as to advertise their gambling website, or some illegal pursuit) then we don't want any content that they create, even whatever they entered into the publisher name field, to be displayed on the website until it's been reviewed.

Thanks. That makes sense.

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

Perhaps a mitigation to this might be to indicate if the search term appears in unapproved publishers, without actually listing them? Some carefully-crafted help text would be required to explain what was going on, however.

It feels tricky and something that will be difficult to implement. Let's see what options @cormachallinanderilinx can offer us.

siwhitehouse commented 11 months ago

310 paraphrased here:

Add a Date Created column to this list. This will help us determine which are the newly created publishers that are waiting for approval.

Can this be added to the acceptance criteria, please?

robredpath commented 10 months ago

Thanks @siwhitehouse. I've updated the initial comment in line with our conversation here.

Two unresolved points, though:

What is the purpose of "presence of errors in data"? Is this a Boolean, numerical or some other type of field?

As I understand it, this is to make it easier to locate the correct publisher in a list where other search terms might lead to lots of results

I'm still unclear about the use case for a member of IATI support to be using this. Should we be considering how this particular filter interacts, or doesn't, with http://dashboard.iatistandard.org/data_quality.html ?

I'm not sure where this requirement came from. Maybe @IsabelBirds might know? I have no attachment to it, it's just in the Miro board so it's made its way here!

Not allowing people to check if their organisation is in the registration process is likely to increase the number of duplicate registrations that we see.

To clarify: would this make things worse, or is this the current situation (and so this just doesn't make things better)?

IsabelBirds commented 10 months ago

The error field was an idea to reduce the amount of digging we have to do to offer support. Eg a an error count per activity from the validator.

Then if I'm already engaged with an org and can easily notice that they have errors, I can bring this up and offer support. This is likely to increase uptake and changes to data quality compared to contacting orgs out of the blue.

cormachallinanderilinx commented 10 months ago

For the is approved and has published and has errors. After spending some time on this I'm not sure this the correct place, the publisher search page only returns approved publishers so their datasets can be viewed. I think its probably a good idea to keep it like this.

This view is available for viewing pending publishers: https://www.iatiregistry.org/dashboard/mypublishers-pending I add another tab (or similar) to show all that haven't published and possibly the publishers with errors (although this may not be as straight forward) Then is is kept to the dashboard which is only available to admins. there isn't a button to access the dashboard on the registry, so it probably makes sense to link that to the UI.

Just use this this ticket to improve the searching (fuzzy logic) and fix the sorting/?

robredpath commented 10 months ago

I think we can return to the user story to help us here:

As a member of IATI Support, I want to find publishers using the information I have available† so that I can quickly discover the Registry situation for someone that I'm helping.

The end state that we're trying to get to here is a situation where, when someone contacts IATI Support, we can quickly understand which Registry publisher(s) correspond to the person and/or organisation who has contacted us, and what the current state of them is.

Ideally, I think that would be part of the existing publisher search, because then there's just one place that you go to look for information about publishers. However, I think we're open to it being a separate admin tool if that's more straightforward in terms of implementation and security.

If the information is split across multiple tabs or multiple searches it becomes harder to use: at best you need to do the search multiple times, and it becomes very easy for people to either not know about or forget to use the other tabs.

Is that feasible: a tab in the dashboard which supports all the functionality that we've discussed here in a single view that's admin-only?

cormachallinanderilinx commented 10 months ago

yes, that sounds good to me

robredpath commented 10 months ago

Cool - I want to hear from @siwhitehouse before we proceed, though!

cormachallinanderilinx commented 10 months ago

Estimate 3 days

siwhitehouse commented 10 months ago

The error field was an idea to reduce the amount of digging we have to do to offer support. Eg a an error count per activity from the validator.

Then if I'm already engaged with an org and can easily notice that they have errors, I can bring this up and offer support. This is likely to increase uptake and changes to data quality compared to contacting orgs out of the blue.

I'm not clear still, my apologies. How could we show a per-activity error count when the search is at publisher level?

siwhitehouse commented 10 months ago

@robredpath I don't think we should have a single control for "is approved", "has published", "has errors in data".

I think we want to be able to filter by the three statuses that a publisher might be in: "registered (unapproved)", "approved (unpublished)" and "published". By default, a search should show all statuses. Either a single control, or a set of controls, should let us filter by status.

Separately, we want to be able to filter by whether a publisher has errors in its set of published files. That should show the number of errors, which we should be able to order on. @IsabelBirds have I specified what you have in mind here?

Thanks to @cormachallinanderilinx for the estimate.

siwhitehouse commented 10 months ago

I had misinterpreted @IsabelBirds latest comment.

What we would like is the mean average of errors per activity for the publisher as a column in the search results. That figure should also contain a link to the publisher's page on the IATI Validator, please.

cormachallinanderilinx commented 10 months ago

Is this the URL you would like included? https://validator.iatistandard.org/organisation/aiddata

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

To the best of my knowledge the validator only expose two APIs https://developer.iatistandard.org/api-details#api=iati-validator-v2&operation=get-pub-get-report

robredpath commented 10 months ago

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

The Validator API returns report.summary.critical which is a count of "critical" (i.e. structural validation) errors. We might also want to include report.summary.error (ruleset errors that contain "must", according to the validator docs) - any views @IsabelBirds @siwhitehouse ?

These are on a per-file basis; the way that we use CKAN in the Registry means that "file" and "dataset" are synonymous.

The pipeline that feeds the Validator starts with the Registry, so any file that exists on the Registry should have an entry in the Validator. There will be a time lag, I'm not sure what it is precisely, but it won't be long! @simon-20 or @odscjames might be able to advise.

Likewise, the Registry should know about updates to files first out of any of our systems. I'm not sure if there's an edge case where a file at an unchanged URL has been updated; again I hope that @simon-20 or @odscjames can advise on that.

We discussed on the call that this could result in a lot of API calls if the results page has a lot of publishers on, each of whom have a lot of datasets. Given that the Registry knows about changes to files first, it should be fine to cache results and invalidate the cache based on Registry / archiver updates. The API isn't actually as fast as I thought (I'm seeing 300-400ms response times); we can look into improving that but caching will likely be important. The API should support a reasonable number of concurrent queries, which would hopefully speed up total time to compile the list.

robredpath commented 10 months ago

@cormachallinanderilinx do you already have an IATI API key? We can help you get signed up and increase your access level once you're up and running if not.

robredpath commented 10 months ago

This gist is an example response for a file with several ruleset errors, but that is valid IATI data. The summary elements are at the end of the response.

robredpath commented 10 months ago

I have opened several issues against the Validator API repos for us to investigate whether we can make the Validator API more suitable for this use. Depending on complexity and how well this sits alongside other work we're doing on the Validator API, we may be able to make these changes very quickly, or not for several months.

The issues are: Allow querying of multiple files at once Speed up response Allow users to request just particular elements of the response

simon-20 commented 10 months ago

The pipeline that feeds the Validator starts with the Registry, so any file that exists on the Registry should have an entry in the Validator. There will be a time lag, I'm not sure what it is precisely, but it won't be long! @simon-20 or @odscjames might be able to advise.

I did a quick check, and there is a fair bit of variation. Over half of the datasets currently known about by the Datastore were validated within 30 minutes; but there is a long tail on this one, some can take a few hours, and if there is a problem--a publisher is flagged, for instance, for too much invalid data too quickly--then full validation may take much longer.

cormachallinanderilinx commented 10 months ago

Hi @robredpath yes I have an API key set up for some work we were looking into previously

robredpath commented 10 months ago

@odscjames could you get in touch with @cormachallinanderilinx via email and make sure that we know which is Derilinx' API key and that it has appropriately high limits? I want to make sure we're ahead of any rate limiting complications.

siwhitehouse commented 10 months ago

Do you know is there a validator API that can be used to access this which will allow us to get a count of errors as we dont store the count?

The Validator API returns report.summary.critical which is a count of "critical" (i.e. structural validation) errors. We might also want to include report.summary.error (ruleset errors that contain "must", according to the validator docs) - any views @IsabelBirds @siwhitehouse ?

We discussed this and we prefer to have them both included, please.

What about Warnings @robredpath ? Are they queryable through the API too?

These are on a per-file basis; the way that we use CKAN in the Registry means that "file" and "dataset" are synonymous.

So, is it possible to get a mean average of errors per activity then?

robredpath commented 10 months ago

We discussed this and we prefer to have them both included, please.

Is it more useful for them to be provided separately, or added together into one aggregate figure? I'm conscious that fixing a structural issue might then allow validation to proceed to the point where many warnings are triggered, so this number might appear to get worse as the data is actually improving.

What about Warnings @robredpath ? Are they queryable through the API too?

Yes, report.summary.warning provides that figure.

So, is it possible to get a mean average of errors per activity then?

@cormachallinanderilinx this one's for you!

odscjames commented 10 months ago

@odscjames could you get in touch with @cormachallinanderilinx via email and make sure that we know which is Derilinx' API key and that it has appropriately high limits?

I've found the API key and it looks like it is already at high limits.

cormachallinanderilinx commented 6 months ago

I have had to do quite a bit of refactoring here, here is some examples of searching I added:

Searching

Search by name of title: searches all where name 'like' test: https://staging.iatiregistry.org/publisher/?q=test This exactly matches with a title from the DB: https://staging.iatiregistry.org/publisher/?q=pub_737839

Search by Country: https://staging.iatiregistry.org/publisher/?q=publisher_country%3DAS&sort=title+asc The search works by name code, but im thinking adding a dropdown (maybe using select2) with countries name displayed and use the code as the value to run the query

Search by publisher id: https://staging.iatiregistry.org/publisher/?q=publisher_iati_id%3Dtest_publisher_id_date&sort=title+asc OR get all with publisher_ https://staging.iatiregistry.org/publisher/?q=publisher_iati_id%3Dpublisher_&sort=title+asc

Seems searching both country and publisher_id at the same time is breaking with my latest changes. Will fix and update here.

Sorting

Paging is not working in UI so just update the url for now _Also when they are rendered on the page they are not in alphabetical order but the paging is correct (see 3rd example for publishercountry on page 11, it goes from A to B but Bangladesh is first with some A's after but page 12 is all B publisher_country https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=1 https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=10 https://staging.iatiregistry.org/publisher/?q=&sort=publisher_country+asc&page=11 publisher_iati_id - this example is descending https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=1 https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=2 https://staging.iatiregistry.org/publisher/?q=&sort=publisher_iati_id&page=3 publisher_organization_type https://staging.iatiregistry.org/publisher/?q=&sort=publisher_organization_type&page=1

@siwhitehouse and @robredpath if ye would like to have a play around and give me any feedback on how you would like this better implemented in the UI please let me know. As you can see when you go to one of the urls it auto populates the search box (this is ckan default) but from trying it out yet might get some ideas.

Im going to work on tests and fix the know issues mentioned above before working on the UI so ye can have a feel and provide any feedback.

cormachallinanderilinx commented 6 months ago

https://iatiregistry.org/dashboard/recent-publishers?q=tearfund&sort=publisher_first_publish_date+desc

siwhitehouse commented 6 months ago

Searching Search by name of title: https://staging.iatiregistry.org/publisher/?q=bank returns two entries: World Bank and Caribbean Development Bank https://iatiregistry.org/publisher/?q=bank returns nine entries, including African Development Bank https://staging.iatiregistry.org/publisher/?q=afdb returns African Development Bank, suggesting that the new search is failing to find as many organisations as the current one.

cormachallinanderilinx commented 5 months ago

@siwhitehouse Sorting: By default its the most recently created publisher: After searching you can select a different sort from the drop down

Screenshot 2024-06-27 at 15 27 05

Search The publisher search will search by default on name, title and IATI Publisher Id. Example: https://staging.iatiregistry.org/publisher/?q=bank&sort=created+desc This will check the above 3 fields.

If you want to search for an exact IATI publisher ID you can also add it here and it should match example: https://staging.iatiregistry.org/publisher/?q=XI-IATI-WBTF%09&sort=created+desc It will also return if a name or title matches. If you want a specific match on publisher id you can use

Country Search is a bit different as you cannot search on country name. I just got an idea on this as im typing, I will look into somthing and update you when I check it out.

Paging Is should now work properly and keep the same as selected from the dropdown. Where before it could go from A on page 1 to D on page 2 and A again on page3. The paging at the bottom has disappeared, ill work to get this back in. but yo go to the next page add &page=2 to the end of the URL or for page 3 &page=1 Example search 'a' which will return loads of results. https://staging.iatiregistry.org/publisher/?q=a&sort=created+desc this will start at page 1 so add it to the end https://staging.iatiregistry.org/publisher/?q=a&sort=created+desc&page=2

State state is displayed if you are logged in as a sysadmin only The 2 states are

  1. active
  2. approval needed

This is the url to get all that needs approval, think we should have a checkbox or button or what makes senses to add this query through the UI? See: https://staging.iatiregistry.org/publisher/?q=&state=approval_needed

siwhitehouse commented 4 months ago

Hello @cormachallinanderilinx - thank you for the update and my apologies for not responding sooner.

Sorting, no pagination

I searched for the word "foreign" and was returned nineteen organisations. Handy for testing the sorting without pagination.

Created date - looks good Name order - looks good IATI org identifier - XM-DAC-21-1 is placed before XM-DAC-3-1 - this is probably ok Organisation type - 18 are government so not a great test, but asc and desc both looked good. I'm not sure we need this sort, to discuss Country/Region - looks good

Sorting, with pagination

I searched for the word "the" and was returned eight pages of results.

Name

I sorted on "name ascending" and it looked fine until the last entry "National Association of Municipalities of Benin (ANCB), The" a bit of a jump from the previous result of "Doctors of the World / Medecins du Monde".

Clicking on the second page led to a page sorted by "Created Descending".

To see the second page of results sorted by name descending I typed https://staging.iatiregistry.org/publisher/?q=the&sort=name+asc&page=2 directly into the address bar. The first entry was Doctors of the World UK

I think my description of how pagination currently behaves is different to yours, but this may be due to changes you've made since your update. Could you follow my steps, see if you can replicate and look at why "National Association of Municipalities of Benin (ANCB), The" appears out of order, please?

IATI Organisation Identifier

Using the dropdown menu all numerical codes (for e.g. '30001') start appearing after 'GB-CHC-1000566'.

Organisation type

Ascending starts from "government". Descending starts from "Academic, Training and Research". I suspect we are ordering by the code value rather than the name. See https://iatistandard.org/en/iati-standard/104/codelists/organisationtype/

Country/Region

South Africa appears between United States and Uganda. United Kingdom appears between Netherlands and Nigeria

cormachallinanderilinx commented 4 months ago

@siwhitehouse Sorry I was working on the pagenation button yesterday evening, I got it displaying but the click still isnt full working. I should have left a comment.

Name this is fixed, the issues was the ordering was correct but the name in the database starts with "The" but in the UI it was 'normaizing' it and putting "The" at the end. Im guessing we dont want this anymore as we adding this sorting alphabetically?

IATI Organisation Identifier Fixed. I had to add in a new version of tablesorter.js The previous version was 11 years old

Organisation type Yes, your correct this should be fixed.

Country/Region Fixed

siwhitehouse commented 4 months ago

@cormachallinanderilinx My apologies in turn for the delay in getting back to you.

I assume the pagination still isn't ready for testing

Name

I don't understand your question. At the moment it looks to me that you no longer 'normalise' names by placing 'the's at the end of a name. I think that is good for the display, but I suspect we would still want to sort on the 'normalised' version. At the moment all organisations whose names begin with "The" are ordered using it, meaning they are all bunched together.

@robredpath can you advise on best practice here, please?

IATI Organisation Identifier

Yes, that looks better now.

Organisation Type

and

Country/Region

Thanks these both look good now.

That leaves pagination to be fixed and Name ordering we should wait for Rob's opinion.

siwhitehouse commented 4 months ago

@cormachallinanderilinx

I have a couple of comments about styling/layout.

Table settings

The table looks like it has fixed-width columns. Here is a screenshot of the top of the table when I search on 'development'

image

Could we configure the table display so that it avoids such text wrapping? From a fixed-width perspective, I think we could add width in the left-hand side columns from those on the right hand side. Ideally, the table would adapt to the display settings of the person's browsew/display settings. I don't know the possibilities and limitations to an approach like this though.

Ordering by table header

I can still do this, but the text in the table header doesn't afford clicking. The sorting is on-page only and not across the whole of the returned data.

I would like ordering by table header to be clear to the user and for it to perform the same sorting as the dropdown i.e. across all of the returned data.

cormachallinanderilinx commented 4 months ago

@siwhitehouse I don't understand your question. At the moment it looks to me that you no longer 'normalise' names by placing 'the's at the end of a name. I think that is good for the display, but I suspect we would still want to sort on the 'normalised' version. At the moment all organisations whose names begin with "The" are ordered using it, meaning they are all bunched together. yes I have remove the normalization, so we will leave this until rob gives his opinion?

I have fixed the pagenation, still doing a bit of testing myself but looks good

On the table header clicks I will look at this now.

robredpath commented 4 months ago

@robredpath can you advise on best practice here, please?

I think that being clear about the normalisation and having it be consistent across the site is more important than whichever approach we choose - so, whatever we do elsewhere is what we should do here.

siwhitehouse commented 4 months ago

We discussed this on our call today. @cormachallinanderilinx will remove the sort from the column headers in the table, @siwhitehouse will check the pagination and then share this with the rest of IATI Support for feedback.

siwhitehouse commented 4 months ago

Pagination looks good now, thank you @cormachallinanderilinx

Is the API set up for the Staging instance? I'm asking because if I query

https://iatiregistry.org/api/action/organization_list?all_fields=true

then I get a list of organisations, but if I query

https://staging.iatiregistry.org/api/action/organization_list?all_fields=true

I get a 401 response. We'd like to be able to check the organisations in the "approval needed" state through the API and the UI.

cormachallinanderilinx commented 3 months ago

@siwhitehouse Since it the staging URL the 401 suggests to me that you need to go to the staging site and put in the basic auth credentials. If you are using python or a tool like postman you will also need to add the basic auth there

python example: import requests from requests.auth import HTTPBasicAuth res = requests.post('https://staging.iatiregistry.org/api/action/organization_list?all_fields=true', auth=HTTPBasicAuth('user', 'password')) print(res)

Postman: You will need to set it as well, there should be a basic auth option under Authorization

Screenshot 2024-07-30 at 17 41 09
siwhitehouse commented 3 months ago

@cormachallinanderilinx Thank you. Unfortunately, I am still receiving a

<Response [401]>

error when I follow your instructions.

I logged into my https://staging.iatiregistry.org/user/simonwhitehouse account and I created an API token. I then amended the code you posted above to include my username and API token. Running the code returns the 401.

I have just shared the code with you (via Deepnote) for you to troubleshoot.

I'd note that originally I was sending this as a get request without authentication, as per https://iatistandard.org/en/iati-tools-and-resources/iati-registry/iati-registry-api/publisher-endpoints/#ListPub

cormachallinanderilinx commented 3 months ago

Hi @siwhitehouse This issue wasnt adding the API token, it was the Basic Auth

Thanks for sharing the Deepnote, I was able to fix it up there along with 1 or 2 small changes.

FYI the API will have paging (offset and limit), by default the limit is 20 so the next page will be: https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=20&limit=20

You can also set a higher limit but response will be slower, example of 100 at a time: https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=0&limit=100 To get the next 100 we set the offset to 100: https://staging.iatiregistry.org/api/action/organization_list?all_fields=true&offset=100&limit=100

siwhitehouse commented 3 months ago

When logged in as https://staging.iatiregistry.org/user/simonwhitehouse I receive an internal server error when I click on the link to the last page (117). I also receive an internal server error when I select Order By "Created Ascending".

@cormachallinanderilinx I don't know why I am seeing these now when I didn't before. Can you investigate this, please? Let me know if you need any information from me.

siwhitehouse commented 3 months ago

When logged in as https://staging.iatiregistry.org/user/simonwhitehouse I see twenty publishers per page and (I assume) 116 pages return results. So, I would expect to see 2320-2340 publishers in the CSV download. I only see 1357.

This is my alternative check on the number of organisations appearing in the UI matching those in the database, as I don't have the coding skills to page through the API.

@cormachallinanderilinx I think the check here should be that the data in the CSV download matches that returned in the UI via the query in the URL. The API should also be consistent. This doesn't appear to be the case at the moment. Can you investigate this before we do any more testing, please? Happy to provide more information if you need it.

cormachallinanderilinx commented 2 months ago

Hi @siwhitehouse Good catch, It looks like this may be an overall issue as the download is completely separate from actual publisher code. I will work on building the publisher search code into the download functionality as well I think it complete makes sense to do this make sure they are the same.

cormachallinanderilinx commented 2 months ago

Hi @siwhitehouse I just wanted to note I have done some work on aligning the downloads. However, to properly align them will probably take another 1-2 days work. It is working but it is very very slow and will result in a lot of Timeout Errors. The reason is getting the GroupExtra details, the way we do it for the page load is a organization_show which is fine for 20 items on a page but for a couple of hundred its an issue. Just want to check ye are with me continuing this work as part of this ticket or would ye rather it be done separate?

siwhitehouse commented 2 months ago

Hi @siwhitehouse Good catch, It looks like this may be an overall issue as the download is completely separate from actual publisher code. I will work on building the publisher search code into the download functionality as well I think it complete makes sense to do this make sure they are the same.

What do you mean by the "actual publisher code" here, please Cormac?

siwhitehouse commented 2 months ago

Hi @siwhitehouse I just wanted to note I have done some work on aligning the downloads. However, to properly align them will probably take another 1-2 days work. It is working but it is very very slow and will result in a lot of Timeout Errors. The reason is getting the GroupExtra details, the way we do it for the page load is a organization_show which is fine for 20 items on a page but for a couple of hundred its an issue. Just want to check ye are with me continuing this work as part of this ticket or would ye rather it be done separate?

Hi @cormachallinanderilinx

aiui we have two use cases for aligning the downloads:

  1. As a check that the new Publisher search UI is returning a full and correct set of results
  2. Because they should provide the same results to end users when this goes into live

I can't offer an opinion on the detail of how you propose to fix this, other than to say it looks like you are focusing on this end goal. It's fine to spend the time on this, so please do go ahead.

I have a couple of other observations:

  1. The "State" column no longer appears in the table in Staging
  2. The Downloads should also be updated to include the "State" column. I'm not sure if we have stated this explicitly before.

Finally, I'll leave it to you if you think it is better to set up a separate issue for aligning the Downloads. My preference would be for a new issue at this point.

cormachallinanderilinx commented 2 months ago

As a check that the new Publisher search UI is returning a full and correct set of results and What do you mean by the "actual publisher code" here, please Cormac? The download CSV (even before these changes) were never developed to match the search functionality - the were developed as 2 seperate things. The download button calls code that only downloads 'active' publishers who have published.

The "State" column no longer appears in the table in Staging Can you check if you are logged in? this is hidden if you are logged out

siwhitehouse commented 2 months ago

What do you mean by the "actual publisher code" here, please Cormac? The download CSV (even before these changes) were never developed to match the search functionality - the were developed as 2 seperate things. The download button calls code that only downloads 'active' publishers who have published.

So, "actual publisher code" means the code that the UI uses to fetch and display publishers.

The "State" column no longer appears in the table in Staging Can you check if you are logged in? this is hidden if you are logged out

The staging website is still showing me an internal server error, but I expect you are right (D'oh)

siwhitehouse commented 2 months ago

@cormachallinanderilinx I have gone through this issue and I think the following have been discussed and have not yet been implemented:

  1. error counts
  2. removing the sorting from the table headers
  3. a filter for the different publisher states
  4. a country filter
  5. word wrapping in tables
  6. searching on multiple fields

You also said that:

“Country Search is a bit different as you cannot search on country name. I just got an idea on this as im typing, I will look into somthing and update you when I check it out.”

but I don't see anything since then.

Could you give us an update on these before I share this with the team please?

cormachallinanderilinx commented 1 month ago
  1. error counts (see below - causing slow page load)
  2. removing the sorting from the table headers (done)
  3. a filter for the different publisher states (done - Add a dropdown to show only approval_needed OR show all as I dont see anywhere in the issue that there should be a filter for showing active or only approval_needed but I think it will be very useful to show only approval_needed)
  4. a country filter (done - drop down with a list of available countries)
  5. word wrapping in tables (made some changes but when the cols on the right are made smaller it affects the header)
  6. searching on multiple fields (what fields should be searchable as per the Acceptance Criteria A search box that can accept a name, org-id, or country so the search was implemented that if you search in the search box will will filter on all fields in the table as suggested to have a single control for searching. If we want to add a specific search for a field we will need more controls like Countries and State)

For error count I have been testing here and as you can see the page takes a long time to load: https://staging.iatiregistry.org/publisher/?q=&publisher_country=&state=&sort=created%20asc&page=3 one publisher has 35 datasets and another with 59, the rest roughly average about 2 which is think it a good test for production. Reason being to get the count, first I need to get all the packages for each org that is loading. Then on each package I need to run the IATI validator API: GET https://api.iatistandard.org/validator/report?name= Which for which for this example it needs to run about 100 times and wait for a response as the page is loading, so in order for me to get a count it will reduce the page loading time a lot. Unless there is an API where we can all errors for an org, im not sure there is a clean or out of the box way to get it. I also looked at trying to download the CSV here but my key is invalid but again this will be for loops after reading in a CSV which I dont think is a great thing to do on a page load. https://api.iatistandard.org/vs/pvt/publishers/79d85709-2e20-4174-bc3d-d7179f6cc0eb/documents