AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Mark problem lists as private #865

Closed peggynewman closed 1 year ago

peggynewman commented 1 year ago

Using a MYSQL query on the lists to bulk update the following lists as private:

write the query, create a temp table to keep a copy of the data, apply the update then remove the temp table after testing.

@TaniaGLaity are there other bulk operations you'd recommend?

TaniaGLaity commented 1 year ago

all others would need more consideration I think

rosemaryjoconnor commented 1 year ago

Hi Peggy I got slightly different numbers for these in Prod:

I'll start on lists-test now and creation of temp table etc... good to do something a little different! chat soon Rose


From: Peggy Newman @.> Sent: 23 March 2023 15:39 To: AtlasOfLivingAustralia/data-management @.> Cc: OConnor, Rosemary (NCMI, Dutton Park) @.>; Assign @.> Subject: [AtlasOfLivingAustralia/data-management] Mark problem lists as private (Issue #865)

Using a MYSQL query on the lists to bulk update the following lists as private:

write the query, create a temp table to keep a copy of the data, apply the update then remove the temp table after testing.

@TaniaGLaityhttps://github.com/TaniaGLaity are there other bulk operations you'd recommend?

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/data-management/issues/865, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXZZDFDALGZYY3M6IQDYBKDW5POXJANCNFSM6AAAAAAWEXEP6A. You are receiving this because you were assigned.Message ID: @.***>

peggynewman commented 1 year ago

Those numbers look reasonable. Ensure that you don't change anything that is flagged authoritative. (authoritative records are indexed with the list ID in pipelines).

We should be able to intermittently run it as a clean up script.

Also those account related issues, might have to look in the user admin system for a list of user ids for those. Not sure exactly.

rosemaryjoconnor commented 1 year ago

Records updated in lists-test. Mysql raw scripts in github - authoritative-lists/source-code/scripts.

peggynewman commented 1 year ago

I can't see the scripts in GH ... but wonder why there are 3? Can you do multiple queries in one script?

rosemaryjoconnor commented 1 year ago

Odd they should be there. Ii'll check first up tomorrow. Only 3 now as I was testing bits separately. I intended to putt it all into one and comment it once sure it was fine.

The update is simple it was getting the other queries correct that I was being careful with

I was being super- cautious!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Peggy Newman @.> Sent: Monday, March 27, 2023 5:08:00 PM To: AtlasOfLivingAustralia/data-management @.> Cc: OConnor, Rosemary (NCMI, Dutton Park) @.>; Assign @.> Subject: Re: [AtlasOfLivingAustralia/data-management] Mark problem lists as private (Issue #865)

I can't see the scripts in GH ... but wonder why there are 3? Can you do multiple queries in one script?

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/data-management/issues/865#issuecomment-1484615697, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXZZDFAUXUV7RTKWJ6P74S3W6E4FBANCNFSM6AAAAAAWEXEP6A. You are receiving this because you were assigned.Message ID: @.***>

rosemaryjoconnor commented 1 year ago

Ok so I couldn't leave it! I had done the commit forgot to push... done now.


From: Peggy Newman @.> Sent: 27 March 2023 17:08 To: AtlasOfLivingAustralia/data-management @.> Cc: OConnor, Rosemary (NCMI, Dutton Park) @.>; Assign @.> Subject: Re: [AtlasOfLivingAustralia/data-management] Mark problem lists as private (Issue #865)

I can't see the scripts in GH ... but wonder why there are 3? Can you do multiple queries in one script?

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/data-management/issues/865#issuecomment-1484615697, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXZZDFAUXUV7RTKWJ6P74S3W6E4FBANCNFSM6AAAAAAWEXEP6A. You are receiving this because you were assigned.Message ID: @.***>

TaniaGLaity commented 1 year ago

lists in test look heaps better -got rid of a LOT of junk!

still need to make ones that have test in the List Name private too

Also had another idea for marking some private - can you de-duplicate lists? e.g. where they have the same name, type, owner, date submitted and date updated and item count - make all but one private?

rosemaryjoconnor commented 1 year ago

Hi Tania Thanks for looking at those so quickly. Re:

  1. still need to make ones that have test in the List Name private too - ah I can see the bug in the sql I will get that fixed.
  2. de-duplicating: I will have a look at how to do that asap. Should be do-able!

thanks Rose


From: TaniaGLaity @.> Sent: 28 March 2023 15:30 To: AtlasOfLivingAustralia/data-management @.> Cc: OConnor, Rosemary (NCMI, Dutton Park) @.>; Assign @.> Subject: Re: [AtlasOfLivingAustralia/data-management] Mark problem lists as private (Issue #865)

lists in test look heaps better -got rid of a LOT of junk!

still need to make ones that have test in the List Name private too

Also had another idea for marking some private - can you de-duplicate lists? e.g. where they have the same name, type, owner, date submitted and date updated and item count - make all but one private?

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/data-management/issues/865#issuecomment-1486243208, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXZZDFEQ3XGGIZHMH64C7JTW6JZQZANCNFSM6AAAAAAWEXEP6A. You are receiving this because you were assigned.Message ID: @.***>

rosemaryjoconnor commented 1 year ago

Hi Tania Just taking a look at the de-dup logic. It is do-able of course however, using the date updated may not be particularly useful as this gets updated even if a typo in a name in metadata or anything changes. So it could be changed on one duplicate but not on another. Doug is in the process of adding another field, a date loaded which will indicate when the data was last loaded, I think this would be more useful to include when it is available.

In the meantime I see what I can pull out and get things in place.

thanks Rose


From: TaniaGLaity @.> Sent: 28 March 2023 15:30 To: AtlasOfLivingAustralia/data-management @.> Cc: OConnor, Rosemary (NCMI, Dutton Park) @.>; Assign @.> Subject: Re: [AtlasOfLivingAustralia/data-management] Mark problem lists as private (Issue #865)

lists in test look heaps better -got rid of a LOT of junk!

still need to make ones that have test in the List Name private too

Also had another idea for marking some private - can you de-duplicate lists? e.g. where they have the same name, type, owner, date submitted and date updated and item count - make all but one private?

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/data-management/issues/865#issuecomment-1486243208, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXZZDFEQ3XGGIZHMH64C7JTW6JZQZANCNFSM6AAAAAAWEXEP6A. You are receiving this because you were assigned.Message ID: @.***>

peggynewman commented 1 year ago

Hey Rose, I think that just retaining the one with the last date updated, even in the metadata, is probably fine to go ahead with. My suspicion is that most of these are not high quality lists any way. Got a few examples: fire names 2v fire names 2v Flora of the Sydney Basin Region Flora of the Sydney Basin Region

peggynewman commented 1 year ago

This is with Dave, will wait till finished

djtfmartin commented 1 year ago

I've ran these scripts. In addition i ran the following: https://gist.github.com/djtfmartin/551e632d75679b574bdc037abaa79208

Im closing this issue. we can createnew issue(s) with problem lists once they are identified

TaniaGLaity commented 5 months ago

@checksfields this is the ticket and link to script for Dave's cleaning up of lists . Amanda doesn't appear to be able to access this git hub repository