DigitalCommons / mykomap

A web application for mapping initiatives in the Solidarity Economy
3 stars 0 forks source link

[CWM] Create North America Map with MykoMap 3.1.7 #273

Closed ColmDC closed 1 month ago

ColmDC commented 1 month ago

Track time under Cooperative World Map in Clockify Create a MM using the code version and data currently published in https://dev.maps.solidarityeconomy.coop/qa/dc-ica-ncba-cuk/ and the following data changes.

1) Drop Co-ops UK Data 2) Drop DotCoop registrants that are not in US, Cananda or Mexico, but use latest DotCoop data 3) Include the following US Data sources in Google Drive

ColmDC commented 1 month ago

See old ticket for previous merge map.

ColmDC commented 1 month ago

For US NCG Members,

US Farm Credit Administrators,

USDA Agricultural Cooperatives

US Credit Unions

ColmDC commented 1 month ago

They will need to have filters added for the new data sources. (Note that this method of filtering by data source will not scale mich more than this number of sources.)

ColmDC commented 1 month ago

Can we position the map at dev.maps.coop/north-america ?

wu-lee commented 1 month ago

So having gotten my head back into the space somewhat, note that this is not as simple as it first appeared from what you'd mentioned in Element.

I was thinking you were adding a single new dataset, hopefully with the same format and semantics as the CUK dataset. This would just mean running the same script as-is with the new dataset instead of the CUK dataset, possibly with minor changes for the names.

If the new dataset has different format/semantics then it needs converting to the normalised format first. Ok...

However, this is adding four new datasets, each one with different format and semantics, in place of the CUK dataset. So to start with this is four datasets to normalise into the canonical set of fields.

But then adding more datasets is a qualitatively different process from just replacing one for another. It's multiplicative. The current script isn't written to scale to N datasets, it's just written to work with four.

We need three new levels of matching, finding the overlaps between organisations in the datasets. We had four levels, so this is now seven - roughly doubling the steps the script has to perform, post-normalilsation and cleaning, and assuming there are no funny edge cases because of quirks in the data.

The resulting merged table will have many more fields - 75% more with 3 new datasets. Not all of these will be shown in the map, as the key ones are "merged" into some common fields. But I think at the very least you're asking for a flag field for each dataset indicating membership of any given organisation. Meaning there will be work modifying pop-up layout to show the new fields from the extra datasets on top of this.

So it's doable but be warned that the amount of work is much larger than I thought, and moreover will require restructuring the script, and therefore it will take correspondingly longer...

When it was this needed to be done by? I think you said next Tuesday? If so I will probably need to work on it over the weekend.

ColmDC commented 1 month ago

Okay. Let's check in on this. Let's do a call before you spend more time on it.

ColmDC commented 1 month ago

There is a cleaned version of US NCG Members in GDrive. I have left in some intermediary columns which might be useful.

ColmDC commented 1 month ago

I've uploaded the normalised CSVs here https://github.com/DigitalCommons/demo-merge-map-data/tree/north-america-data

ColmDC commented 1 month ago

Filters: They will each need a filter option: Use the following text. US NCG Member US Farm Credit Administrator USDA Registered Agricultural Coop US Credit Union

Spanish Miembro de NCG de Estados Unidos Administrador de crédito agrícola de Estados Unidos Cooperativa agrícola registrada en el USDA Unión de crédito de Estados Unidos

French

Membre du NCG des États-Unis Administrateur du crédit agricole des États-Unis Coopérative agricole enregistrée auprès de l'USDA Coopérative de crédit des États-Unis

ColmDC commented 1 month ago

For the Pop Up, we drop Co-ops UK Sector (Simplified) and not add any unique fields for any of the new sets. New Filter booleans need adding too.

ColmDC commented 1 month ago

"Updated SQLite database with NCG committed on this branch as dc-ica-ncba-usda-ncg.db https://github.com/DigitalCommons/demo-merge-map-data/tree/north-america-data if you want to inspect (use sqlitebrowser which I think you may already have)"

ColmDC commented 1 month ago

Use NCBA logo rather than CWM one.

wu-lee commented 1 month ago

I've not examined this very deeply so far. Run out of time for today, will try and incorporate this data into a new map tomorrow. If you get time to look in the meantime, et me know if you can see any wonkiness in the data.

ColmDC commented 1 month ago

The map linked in the issue has countries as the category used in the directory - do you still want that?

Leave it as country for now.

ColmDC commented 1 month ago

Note that the new datasets don't have any primary activity values set

You mean in current data drop, or is there an issue with setting it for the new data sets?

wu-lee commented 1 month ago

Note that the new datasets don't have any primary activity values set

You mean in current data drop, or is there an issue with setting it for the new data sets?

Well, in the current drop mainly - correcting myself that they do have activity (hardwired). Adding a merged value in to the data, and excluding non-US/CA/MX organisations now.

ColmDC commented 1 month ago

Nice. I'm popping out for a quick lunch. Will look at latest data when back.

ColmDC commented 1 month ago

I noticed there isn't a DC Registered Column, but perhaps you intend to interpret that from presence or not of a DC Identifier?

wu-lee commented 1 month ago

I noticed there isn't a DC Registered Column, but perhaps you intend to interpret that from presence or not of a DC Identifier?

I can see a "DC Registered" column in the CSV?

wu-lee commented 1 month ago

Ok, there's a map now deployed here:

https://dev.maps.coop/north-america/

I needed to correct one address of a FICU organisation to make it geolocate in the US, not the UK (MILLINOCKE -> MILLINOCKET)

Some orgs don't have Primary Activity. The filter treats those as having the category "Other".

I think it looks like I've done everything?

ColmDC commented 1 month ago

The US ones have a major eastern offset.

ColmDC commented 1 month ago

The US ones have a major eastern offset.

Image

ColmDC commented 1 month ago

The US ones have a major eastern offset.

But only in Firefox. If I zoom in it resets correctly, so it's not the data. :-/

ColmDC commented 1 month ago

Map tab name -> North America Co-ops

ColmDC commented 1 month ago

But only in Firefox. If I zoom in it resets correctly, so it's not the data. :-/

Not an issue in Firefox Android.

ColmDC commented 1 month ago

Not an issue in Firefox Android.

I reckon I can set some zoom configs to sidestep the bug.

wu-lee commented 1 month ago

Map tab name -> North America Co-ops

Fixed

wu-lee commented 1 month ago

The US ones have a major eastern offset.

But only in Firefox. If I zoom in it resets correctly, so it's not the data. :-/

Doesn't seem to happen for me? Is there some trick to replicating this?

ColmDC commented 1 month ago

It's also appearing in Edge, so not Firefox specific. Lots of errors and warnings in the firefox console. console-export-2024-9-30_21-21-28.txt dev.maps.coop-1727727657849.log

ColmDC commented 1 month ago

I wonder if we set the sidepanel to open by default will that sidestep it as changing initial zoom settings didn't help. Setting thr sidepanel to default as open doesn't seem to be something that can be set via url command. Can you set it?

wu-lee commented 1 month ago

It's also appearing in Edge, so not Firefox specific. Lots of errors and warnings in the firefox console. console-export-2024-9-30_21-21-28.txt dev.maps.coop-1727727657849.log

Those look fairly normal - can't see any actual errors or stack traces.

ColmDC commented 1 month ago

Those look fairly normal - can't see any actual errors or stack traces.

It all looks very red when in the live console.

ColmDC commented 1 month ago

We seem to have lost the NCBA Website entries. Look at Slice of New York for example. There are two entries for it. This is probably because despite having dotcoop domains. It doesn't list any of them as it's website. Actually the website from the NCBA entry is not getting through. This is not a new issue. Seems to have been missing in https://dev.maps.solidarityeconomy.coop/qa/dc-ica-ncba-cuk/

too.

Might be that we seem to map Company Domain Name to Domain for NCBA data. I wonder shoulg it be mapped to Website?

wu-lee commented 1 month ago

I wonder if we set the sidepanel to open by default will that sidestep it as changing initial zoom settings didn't help. Setting thr sidepanel to default as open doesn't seem to be something that can be set via url command. Can you set it?

I can see the effect on my desktop when the window is occupying the whole screen.

But enabling the sidebar (locally) doesn't resolve it.

We seem to have lost the NCBA Website entries.

Hmm.. Do you mean the link to a website on NCBA members's pop-ups? [edit: oh you do]

As you say, there seems to be no website field in the original data, just a "Company Domain" field, which is mapped to Domain and used to do the joins to the DC data. The problem with just domains is that they often don't work when naiively converted to an URL, they often just go to a parking page, or nothing. This is why we stopped showing the DC domains as websites, so seems likely the same logic applies.

wu-lee commented 1 month ago

Will have a look at the clustering offset thing tomorrow, run out of steam for tonight.

ColmDC commented 1 month ago

The problem with just domains is that they often don't work when naiively converted to an URL, they often just go to a parking page, or nothing.

That is a reasonable starting assumption but I've taken a look and all the 20 odd I randomly tried resolved to a real website, so they are worth displaying as websites as well as using any .coop ones to connect to other data sets.

ColmDC commented 1 month ago

I can see the effect on my desktop when the window is occupying the whole screen.

Good. Hard to fix if you can't replicate.

My current hunch is that it is caused by a tug of war between the zoom to include all markers and the limit on how far west you can pan. I wonder if we can relax the screen panning boundaries, or put another test co-op in japan somewhere.

wu-lee commented 1 month ago

The problem with just domains is that they often don't work when naiively converted to an URL, they often just go to a parking page, or nothing.

That is a reasonable starting assumption but I've taken a look and all the 20 odd I randomly tried resolved to a real website, so they are worth displaying as websites as well as using any .coop ones to connect to other data sets.

Ok, I have regenerated the NCBA to include the domains as website URLs... the map is showing them now.

Of course the randomly selected case I tried didn't resolve to a website so YMMV...

ColmDC commented 1 month ago

Ok, I have regenerated the NCBA to include the domains as website URLs... the map is showing them now.

I think it is still useful to distinguish dotcoop registered domains from websites. For the example say of a co-op that is in the NCBA and had registered several .coop domains. One of the domains it uses as its main website and declares so in its NCBA data. At the moment all the registered domains are displayed as websites. They all generate hyperlinks which usually are not resolvable and you can't tell which was the website declared in NCBA data, which would resolve.

ColmDC commented 1 month ago

can see the effect on my desktop when the window is occupying the whole screen.

And when I use a narrower browser than I usually have open, the bug does not appear. I can't see a config setting that would allow panning wider than currently supported. Perhaps its hard coded. :-/

wu-lee commented 1 month ago

Ok, I have regenerated the NCBA to include the domains as website URLs... the map is showing them now.

I think it is still useful to distinguish dotcoop registered domains from websites. For the example say of a co-op that is in the NCBA and had registered several .coop domains. One of the domains it uses as its main website and declares so in its NCBA data. At the moment all the registered domains are displayed as websites. They all generate hyperlinks which usually are not resolvable and you can't tell which was the website declared in NCBA data, which would resolve.

All I did just now was add the single domain listed in the NCBA dataset to the Website field in the standard.csv version of that dataset, converted to an URL. So if some of them have multiple domains listed, I think that they're coming from somewhere else, and this also implies they were there before.

In this case, looks like the links are coming from the DC data, which also has the domains mapped to URLs in the Website field, and it appears in the final result because the DC.Website field has a higher priority than the NCBA.Website field.

So it's a field prioritisation problem, partly.

I would not be surprised if there were a bunch more problems like this that we might find if we keep looking. Since the demo is today I'm not sure how much more time to spend on unpicking these loose threads. Presumably I shouldn't go and try to resolve these problems now?

ColmDC commented 1 month ago

So it's a field prioritisation problem, partly.

Agreed.

I would not be surprised if there were a bunch more problems like this that we might find if we keep looking. Since the demo is today I'm not sure how much more time to spend on unpicking these loose threads. Presumably I shouldn't go and try to resolve these problems now?

Demo went well. There is no commitment to make any further fixes to this demo. So will close this ticket. But will create a new ticket for the work to refactor the data merging scripts to make the next bout of data merging easier and see if they can address the shortfall noted in this ticket.

ColmDC commented 1 month ago

V will have a few other opportunities to demo it during the rest of the conference, so don't disable the map.