VertNet / webapp

VertNet web application
8 stars 7 forks source link

Record Estimate in d/l pop-up is broken #349

Closed dbloom closed 10 years ago

dbloom commented 11 years ago

Aaron,

While checking on some search issues presented by the folks at UTEP, and then double checking things with Laura, I have discovered that the estimated number of records that VertNet says will be returned via a download is very inaccurate.

I've tested three Institutions in Chrome and FF - the results are the same in both cases. I tested UTEP, PMNS, and GSU - when searching for these I tested both full-text searches and searched via the "Search this Publisher" button via the publishers list. The results were the same (although the downloaded files varied in size by about 1000k records depending upon whether I used the full-text or publisher search, which was expected).

For UTEP (Centennial Museum), VN estimated that I would be downloading ~217,743 records - this could be correct when using full-text search, but the file I d/l only include ~54k records. When using the 'search this publisher' option I get the same VN estimate, but I should only get 54,300 - the number of records in VN via the UTEP IPT. The file d/l does in fact give me a data set with 54,300 records. utep_dlestimate

For PMNS (Perot), VN estimates that I will be d/l'ing only 63 records using both full-text and publisher searches. There is no way this can be correct since they have 7509 records in the IPT. The resulting d/l files from both search types provide only 63 records which suggests that there is some else very wrong with the PMNS data set and/or the harvest of the PMNS IPT. Laura is documenting this one further in a separate issue - the two issues may be related. pmns_dlestimate

For GSU, VN estimates that my download will include ~12,957 records for both full-text and publisher searches. for both I should have a minimum of 24,851 records per the GSU IPT. When I d/l the publisher search I get a records set with 24,851 records. gsu_dlestimate

Checking some other publishers at random, using the Search this Publisher button:

DMNS (Denver), Publisher search estimates ~71 records - should be at least 56k KU, Publisher search estimates ~ 483,093 - should be at least 686K NMMNH, publisher search estimates ~14,092, should be no more than 6211

I know that the explanation for estimated record counts, particularly in the "1-20 of thousands" type is provided in way that it is to improve performance and that there is going to be some play in those numbers and I totally understand that cost/benefit associated. When a user is ready to d/l a record set, however, I think VN needs to be much more accurate in its estimate - even if it takes a few seconds more for the pop-up window to appear. My concern is that if users see that they are going to get ~217k records back from by UTEP search and they only get back 54k in the download, or they expect 450k from KU and get back 686k in the download, users are likely to think one or more of the following: (1) that their computer/Excel/text reader/browser is not functioning properly, (2) that there is an error in the query that they can't fix, and/or (3) VertNet and the data returned via search is not reliable, so they'd better go to GBIF to get the data.

eightysteele commented 11 years ago

Nice issue! Looking into this. Also looking into the download link issue you submitted via email.

On Fri, Sep 6, 2013 at 9:44 AM, dbloom notifications@github.com wrote:

Aaron,

While checking on some search issues presented by the folks at UTEP, and then double checking things with Laura, I have discovered that the estimated number of records that VertNet says will be returned via a download is very inaccurate.

I've tested three Institutions in Chrome and FF - the results are the same in both cases. I tested UTEP, PMNS, and GSU - when searching for these I tested both full-text searches and searched via the "Search this Publisher" button via the publishers list. The results were the same (although the downloaded files varied in size by about 1000k records depending upon whether I used the full-text or publisher search, which was expected).

For UTEP (Centennial Museum), VN estimated that I would be downloading ~217,743 records - this could be correct when using full-text search, but the file I d/l only include ~54k records. When using the 'search this publisher' option I get the same VN estimate, but I should only get 54,300

For PMNS (Perot), VN estimates that I will be d/l'ing only 63 records using both full-text and publisher searches. There is no way this can be correct since they have 7509 records in the IPT. The resulting d/l files from both search types provide only 63 records which suggests that there is some else very wrong with the PMNS data set and/or the harvest of the PMNS IPT. Laura is documenting this one further in a separate issue - the two issues may be related. [image: pmns_dlestimate]https://f.cloud.github.com/assets/942447/1097598/5b493c7a-1712-11e3-9841-6bd740210b8f.jpg

For GSU, VN estimates that my download will include ~12,957 records for both full-text and publisher searches. for both I should have a minimum of 24,851 records per the GSU IPT. When I d/l the publisher search I get a records set with 24,851 records. [image: gsu_dlestimate]https://f.cloud.github.com/assets/942447/1097597/5b47e848-1712-11e3-9d3b-80ec6d4788b1.jpg

Checking some other publishers at random, using the Search this Publisher button:

DMNS (Denver), Publisher search estimates ~71 records - should be at least 56k KU, Publisher search estimates ~ 483,093 - should be at least 686K NMMNH, publisher search estimates ~14,092, should be no more than 6211

I know that the explanation for estimated record counts, particularly in the "1-20 of thousands" type is provided in way that it is to improve performance and that there is going to be some play in those numbers and I totally understand that cost/benefit associated. When a user is ready to d/l a record set, however, I think VN needs to be much more accurate in its estimate - even if it takes a few seconds more for the pop-up window to appear. My concern is that if users see that they are going to get ~217k records back from by UTEP search and they only get back 54k in the download, or they expect 450k from KU and get back 686k in the download, users are likely to think one or more of the following: (1) that their computer/Excel/text reader/browser is not functioning properly, (2) that there is an error in the query that they can't fix, and/or (3) VertNet and the data returned via search is not reliable, so they'd better go to GBIF to get the data.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/webapp/issues/349 .

laurarussell commented 11 years ago

actually, going to just add my screen casts to this issue so you can see first hand. My screen casts do document two other issues though so I'll add those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov

eightysteele commented 11 years ago

Great! I'm able to reproduce the issue, but this is definitely great to have.

On Fri, Sep 6, 2013 at 10:01 AM, laurarussell notifications@github.comwrote:

actually, going to just add my screen casts to this issue so you can see first hand. My screen casts do document two other issues though so I'll add those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/webapp/issues/349#issuecomment-23954614 .

dbloom commented 11 years ago

I'm off and on today, but let me know if you need anything more on this. On Sep 6, 2013 10:04 AM, "Aaron Steele" notifications@github.com wrote:

Great! I'm able to reproduce the issue, but this is definitely great to have.

On Fri, Sep 6, 2013 at 10:01 AM, laurarussell notifications@github.comwrote:

actually, going to just add my screen casts to this issue so you can see first hand. My screen casts do document two other issues though so I'll add those as separate issues.

https://www.dropbox.com/s/jfn978az9rxn5sk/DownloadAndCountsNotCorrectPerot.mov

https://www.dropbox.com/s/05vcz5nzl2bjbtr/DownloadRecordCountsIssueSchmidtPart1.mov

https://www.dropbox.com/s/a3v3xjif1v6k3k3/DownloadRecordCountsIssueSchmidtPart2.mov

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/webapp/issues/349#issuecomment-23954614> .

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/webapp/issues/349#issuecomment-23954766 .

tucotuco commented 10 years ago

The original issue as described is solved. Popup always show correct counts matching downloads for <10k records and shows ">10k" for others. Large file downloads is a separate issue #376.