Closed steveoh closed 9 years ago
Looks like the query for new dates may be off a little. I ran the command three times in a row this morning and got this:
This looks like it's headed in the right direction to me other than the possibly duplicate data that I mention above. Although I couldn't find the duplicate data in my database so maybe it's just the script reporting that's incorrect?
The first time you ran it new dates were added. So the query the next time updated with the most current date. The script should not run twice in the same day since like you noticed you will get that days results. What were you expecting? Consistent numbers for all three runs? Or an eventual 0?
refs #2
give this a once over @stdavis.
Looks like there are some failing tests.
i think this is ready. Your call @stdavis
Just a couple more comments/questions. I think that it's pretty close.
since that diff is outdated here is the comment now.
I think we do need to do something here.
Two scenarios I mapped out.
You get 123 and 123_wqx where 123 is already in the database.
I think in this situation we would insert a duplicate station.
You get 123 and 123_wqx where neither is in the database.
I think we insert the correct one.
If we strip the wqx prior, we fix 1, but then we would insert the non wqx station in example 2 since extract_stations_by_id
would extract the non wqx station from the response.
Can you confirm this so I feel better about it?
I think that you should do a similar check for existing data (using the SampleId field?) for results that you do with stations above to prevent duplicate data from polluting the database. I would expect that an update process should be able to be run more than once per day without negative effects.
promoting this as well
since that diff is outdated here is the comment now.
I think we do need to do something here.
Two scenarios I mapped out.
You get 123 and 123_wqx where 123 is already in the database.
I think in this situation we would insert a duplicate station.
You get 123 and 123_wqx where neither is in the database.
I think we insert the correct one.
If we strip the wqx prior, we fix 1, but then we would insert the non wqx station in example 2 since extract_stations_by_id would extract the non wqx station from the response.
Can you confirm this so I feel better about it?
I think that's correct but would like to talk through it with you.
I think this is done. There is no unique field in results to check for unique. so when this is ready lets :shipit:
Couldn't you check for sampleID to see if the batch has already been inserted? I think that would be enough to prevent duplicate data.
On Thu, Oct 8, 2015 at 5:31 PM, steveoh notifications@github.com wrote:
I think this is done. There is no unique field in results to check for unique. so when this is ready lets [image: :shipit:]
— Reply to this email directly or view it on GitHub https://github.com/agrc/ugs-db/pull/37#issuecomment-146715581.
Scott Davis Utah AGRC http://gis.utah.gov/developer/ stdavis@utah.gov @SThomasDavis https://twitter.com/SThomasDavis
I would ask Paul first
SampleID should take care of that. It should be closely tied to the date of the sample.
:confetti_ball: i think it should be ready @stdavis :mag: :man: :mag_right:
I'd like an early preview review please