GLL-Better / opendata

Documentation, Changelog and Issues related to the GLL RPDE endpoint
1 stars 1 forks source link

Is the number of GLL sessions increasing due to duplicates? #20

Open domfennell opened 5 years ago

domfennell commented 5 years ago

Hi @itsterry,

Hope you're well. This issue might relate to @GetTheData's issue raised earlier this year. However, we are not sure if this is a problem with our platform, or if there are some duplicates in the GLL feed.

You will see from the following screenshot that there is one "Tots Waterworld" session set to take place at Church Farm Leisure Centre on Tuesday 11th December: screen shot 2018-12-06 at 11 36 06

However, when we go to locate this session in our platform, we find that there are actually 5 identical (as far as we can see) sessions except with a different id each, which are as follows:

Our platform tells us that the first session in the list (gll-14010181) was created (in our platform) on 2017-12-19 22:36:41. The remaining 4 sessions were all created in our platform on 2018-11-17 10:09:59.

We've been monitoring the GLL data in our platform over the past 4-weeks as it has been steadily increasing. On 5th November, there were 17.54k; on 5th December there were 21.93k. The number continues to increase.

We wondered if you might be able to take a look at this?

Thanks and best,

Dom

domfennell commented 5 years ago

Hi @itsterry (cc @GetTheData),

I've done some more digging and found 5 more seemingly identical (save for different ids) sessions in our platform pertaining to a "Swim For All" session taking place at Thamesmere Leisure Centre on 7th December:

The dates these ids were created in our platform (if helpful to identify any discrepancies your side) are as follows:

Whilst the "Tots Waterworld" session in my first comment appeared only once on the Better website, there are 3 seemingly identical sessions (save for location) on the Thamesmere Leisure Centre site, all with time 14:30 - 16:00, as illustrated in the following screenshots: screen shot 2018-12-06 at 12 35 10 screen shot 2018-12-06 at 12 34 56

As mentioned, the one notable difference (as far as we can tell) is that the 3rd listing of the three sessions has Teaching Pool as the location, whilst the other two have Main Pool. However, we can't tell from the data we are receiving via the endpoint how this different location is accounted for, which would, therefore, effectively make it not a duplicated session.

GetTheData commented 5 years ago

Hi @domfennell (and FYI @itsterry),

I only have one of your examples in my live DB: 35028112

As per this issue, there are no deleted items in the GLL feed.

So I pull down the entire feed once a day, and if an item is missing I consider it deleted.

I do have 35025758 and 35025322, but have marked them as deleted.

Does that help?

Dan

domfennell commented 5 years ago

Hi @GetTheData (cc @itsterry),

Thanks for your comment and apologies for delayed reply - OOO on Friday. Noted RE the deleted items.

Could you let me know what you have for 35028111, 35025320 & 35025760, please? The GLL website for Thamesmere Leisure Centre is showing 3 "Swim For All" sessions, all running between 14:30-16:00 on Friday 14th December:

screen shot 2018-12-10 at 11 16 36

Save for different ids and 1 of these sessions taking place in a different pool in the LC, they appear to be identical.

GetTheData commented 5 years ago

Hi @domfennell (cc @itsterry) I have the same - three identical events save for a different pool.

domfennell commented 5 years ago

Thanks for confirming that, @GetTheData.

@itsterry, could you please shed some light on this asap? Many thanks in advance.

itsterry commented 5 years ago

Hi @GetTheData & @domfennell

The data is correct according to the database

`2.3.3 :001 > ki=Klassinstance.find 35028111 => #<Klassinstance id: 35028111, klass_id: 131696, start: "2000-01-01 14:30:00", finish: "2000-01-01 16:00:00", created_at: "2018-11-13 15:12:39", updated_at: "2018-11-27 13:13:06", note: nil, cancelled: nil, date: "2018-12-14", instructor: nil, lanes_open: nil, activity_group: "pool", business_sector_id: 2, bookable: false, booking_option: "no_booking_required", course: false, custom_booking_instruction: false, custom_booking_instruction_text: "", ticketable: false, venue_live: true, deleted_at: nil>

2.3.3 :002 > ki=Klassinstance.find 35025320 => #<Klassinstance id: 35025320, klass_id: 131683, start: "2000-01-01 14:30:00", finish: "2000-01-01 16:00:00", created_at: "2018-11-13 14:51:28", updated_at: "2018-11-27 13:13:01", note: nil, cancelled: nil, date: "2018-12-14", instructor: nil, lanes_open: nil, activity_group: "pool", business_sector_id: 2, bookable: false, booking_option: "no_booking_required", course: false, custom_booking_instruction: false, custom_booking_instruction_text: "", ticketable: false, venue_live: true, deleted_at: nil>

2.3.3 :003 > ki=Klassinstance.find 35025760 => #<Klassinstance id: 35025760, klass_id: 131685, start: "2000-01-01 14:30:00", finish: "2000-01-01 16:00:00", created_at: "2018-11-13 14:52:57", updated_at: "2018-11-27 13:13:01", note: nil, cancelled: nil, date: "2018-12-14", instructor: nil, lanes_open: nil, activity_group: "pool", business_sector_id: 2, bookable: false, booking_option: "no_booking_required", course: false, custom_booking_instruction: false, custom_booking_instruction_text: "", ticketable: false, venue_live: true, deleted_at: nil>`

The venue manager has put in 3 different instances for 3 different activities ('klass_id'), so they're technically not duplicates.

I'll see if there's a process adjustment we can make with the people who are inputting the data to stop them inputting (data which looks like) duplicates (to a human), but from a system point of view, all 3 are valid.

domfennell commented 5 years ago

Thanks, @itsterry. Very helpful info! So the duplicated sessions that are technically not duplicates seem to be one issue (potentially to be addressed by the process adjustment you mentioned).

It appears we still have another problem our end, where have other duplicated sessions in our platform, and also ones in our platform that do not match the data on the GLL website. This suggests the platform has acquired these sessions at some point in the past, but has not been updated of those deleted from the GLL feed. This would point to the issue that @GetTheData referenced earlier in the thread: https://github.com/GLL-Better/opendata/issues/13.

The absence of the implementation you mentioned on March 30th might mean that other data consumers that do not follow @GetTheData's process (pull down the entire feed daily and consider any missing items to be deleted) also have inaccurate data pertaining to GLL activities.

Do you have an update on this issue, please? Any help gratefully received!

itsterry commented 5 years ago

Yes indeed. Updated in #13

domfennell commented 5 years ago

Hi @itsterry, hope you're having a good day.

Can you please confirm whether the sessions advertised on the GLL website (and illustrated by the following screenshot) due to take place at Eldon LC on Monday 17th are duplicates? screen shot 2018-12-13 at 16 36 29

I believe the ids of the 4 Circuits sessions are 27394981, 27394982, 27400355 & 27394980.

itsterry commented 5 years ago

Two of them were duplicates. The remaining two are not duplicates (they have different klass_ids)

T


Terry Shuttleworth

itsterry@gmail.com

Mobile: +44 (0) 77 68 91 81 93 Google Chat: itsterry@gmail.com Skype: itsterry LinkedIn: linkedin.com/in/itsterry

On Thu, Dec 13, 2018 at 4:41 PM Dom Fennell notifications@github.com wrote:

Hi @itsterry https://github.com/itsterry, hope you're having a good day.

Can you please confirm whether the sessions advertised on the GLL website (and illustrated by the following screenshot) due to take place at Eldon LC on Monday 17th are duplicates? [image: screen shot 2018-12-13 at 16 36 29] https://user-images.githubusercontent.com/19166504/49953218-84028380-fef5-11e8-8789-c3ae40e1d15f.png

I believe the ids of the 4 Circuits sessions are 27400355, 27394982, 27394980 & 27394981.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GLL-Better/opendata/issues/20#issuecomment-447036933, or mute the thread https://github.com/notifications/unsubscribe-auth/AAF6S3mH-GtTT5cJYiZ5FK0G7GO8rVp7ks5u4oMigaJpZM4ZGW2B .

domfennell commented 5 years ago

Hi @itsterry, hope you're well.

We've come across a few more duplicates sessions in a couple of the Better timetables. I wonder if you could take a look at the following and let us know what you think?

Mile End Park Leisure Centre: The 3 aerobics sessions due to take place tomorrow (19th March) at 11:30am have ids 42729981, 42729982 and 42729980. We phoned the leisure centre and they told us that they only have one of these sessions in their booking system. Screen Shot 2019-03-18 at 15 28 42

Eldon Leisure Centre: The 2 aerobics sessions due to take place tomorrow (19th March) at 11:15am have ids 43508596 and 43508597. We tried to speak to someone at the leisure centre, but couldn't get through, so we don't know if this is reflected in their booking system or not. Screen Shot 2019-03-18 at 15 42 47

domfennell commented 5 years ago

Hi @itsterry, hope you're well.

Just wondering if you'd managed to find time to take a look at my last message RE duplicate sessions appearing on GLL's websites?

The screenshot below shows a number of seemingly duplicate sessions taking place at Eldon Leisure Centre on Thursday 28th March:

Screen Shot 2019-03-27 at 13 52 37

The screenshot below shows 4 yoga sessions taking place on Saturday 30th March at Sobell Leisure Centre, all with the same information appearing on the timetable:

Screen Shot 2019-03-27 at 14 27 57

I'm trying to contact staff at both leisure centres to find out if these are duplicated in their system. Wonder if you could shed any light on this?

Thanks in advance.

(cc @stephenwinfield)

itsterry commented 5 years ago

Hi @domfennell

We're trying to iron out the duplication issue. It's been plaguing us for a while now

Although we think we've dealt with it on new class creation, it seems to be creeping back in when venue managers update existing classes. We're trying to work out why

Please don't contact centres to ask if they have duplicate sessions: those guys have enough on their plates without worrying about tech stuff: leave it to us!

T

domfennell commented 5 years ago

Hi @itsterry, thanks for coming back to me and thanks for the clarification.

Appreciate that it seems to be a tricky issue and that you're working on it. If you could give us a rough idea of when you think it might be fixed, that would be most helpful... We're currently providing our customers with sessions that will appear to their end user to be identical!

Dom

GetTheData commented 5 years ago

@itsterry @domfennell it sounds like this is in hand, but just confirming I am also seeing duplicates, e.g. this example from Sedgley Library:

image

itsterry commented 5 years ago

Hi @domfennell

It's been a while and reports of duplicates on the GLL system have pretty much ceased

Just checking: is this still an issue?

T

domfennell commented 5 years ago

Hi @itsterry,

Thanks for letting me know. I've just taken a look at some of the links provided earlier in this issue and there are certainly less duplicates appearing than before, which is good news. However, there are still some appearing.

Eldon Leisure Centre on a Thursday: Screen Shot 2019-10-02 at 10 11 37

Sobell Leisure Centre on a Tuesday and Wednesday, respectively: Screen Shot 2019-10-02 at 10 10 01 Screen Shot 2019-10-02 at 10 10 18

Dom

itsterry commented 5 years ago

Thanks @domfennell - I suspect that despite our attempts to prevent the creation of duplicates, we're always going to have a few.

I'll use these as examples, though, and see how they've crept in