X-lab2017 / open-digger

Open source analysis tools
https://open-digger.cn
Apache License 2.0
286 stars 85 forks source link

[Geo data] How can I contribute developer's geographic data to open-digger #1414

Closed PureNatural closed 10 months ago

PureNatural commented 10 months ago

Description

I currently have about 1 million developer's geographic location information, and there will be more data to be parsed in the future.

How can I submit this data to open-digger? @frank-zsy

frank-zsy commented 10 months ago

@PureNatural Thanks, I am curious about what is developers' identification? GitHub users or other platforms? And what is the geographic location information format?

Currently, OpenDigger contains about 215 thousands users with location information on GitHub.

The location information in OpenDigger right now is:

We will try our best to parse the location into detailed info, so the administrative_area_level_1 and more detailed fields may be null.

srsxyc commented 10 months ago

Developers are GitHub users. The location information is:

frank-zsy commented 10 months ago

@srsxyc That will be really great if you give me the JSON file and I will import it into ClickHouse for further use. A few more questions to confirm:

srsxyc commented 10 months ago

The data source we used was the GitHub log data from 2015 to present that you provided. We'll start by getting the de-duplicated usernames from the log data. Then after crawling the user's information through the GitHub API. Finally it is parsed through Bing MAP API. @frank-zsy

frank-zsy commented 10 months ago

Great, can you give me the JSON file and I can import into ClickHouse. By when I mean users may change their location information any time, so I need to know when did you call the GitHub API and retrieve the data from GitHub.

And are latitude and longitude often used for analysis? I can add the columns too.

srsxyc commented 10 months ago

We've only crawled data from 3 million users so far, with the earliest being roughly March 2023 and the latest being roughly May 2023. Because there is still a large amount of data that has not been captured, we have not updated it.

Latitude and longitude is an important piece of information when performing geolocation analysis. I think it's best to keep it.

How do I give you the JSON file?

frank-zsy commented 10 months ago

@srsxyc Any form you like, send me by WeChat after compress, or Baidu pan, or upload it to OSS and share the link. All is fine with me.

frank-zsy commented 10 months ago

@srsxyc Thanks for the data, all the users data has been insert into the gh_user_info table and I will insert the location info to location_info table too.

image
frank-zsy commented 10 months ago

@srsxyc All the data right now have been updated into the ClickHouse, thanks a lot.

image