X-lab2017 / open-digger

Open source analysis tools
https://open-digger.cn
Apache License 2.0
286 stars 85 forks source link

[Cron] could you generate data for OpenGalaxy? #1208

Open tyn1998 opened 1 year ago

tyn1998 commented 1 year ago

Description

Reference: https://github.com/X-lab2017/open-galaxy/issues/34

Hi guys, may I request for a cron task generating data for OpenGalaxy? If you can help, please describe how you will have filtered nodes and links to decrease the size of the graph data when the PR is created.

Cron Expression

every month

frank-zsy commented 1 year ago

/self-assign

I can do this for you. To generate OpenGalaxy global data for every month, I think we can reuse the condition of basic metrics export task which is openrank > e.

We can export all the repos and users nodes with monthly OpenRank value larger than e and activity larger than 2 to avoid too many edges.

With the value above, we can get a graph with 94,789 nodes and 133,960 edges for 2023-01 which is a desirable graph size.

And also I will formalize the edge length into 10 - 30 due to the activity score so the graph will be rendered to OpenGalaxy in a proper way.

A 3,000 iteration layout calculation process will be used to generate position in 3D space, I will try my best to give a continuous position layout result.

frank-zsy commented 1 year ago

I think we can also export label data to a new file so you can use in OpenGalaxy to render different color for the nodes.

tyn1998 commented 1 year ago

Hi @frank-zsy, thank you for the explanation on export details.

By "also export label data", what do you mean? Is it another file different from https://oss.x-lab.info/open_galaxy/v2/labels.json?

frank-zsy commented 1 year ago

The label data here means the label data in OpenDigger, like if the repo is from a company or a foundation. Currently we have more than 10,000 repos and 380 orgs with label, so it will cover lots of the repo nodes and good for color rendering.

tyn1998 commented 1 year ago

That sounds great!

frank-zsy commented 1 year ago

I would like to generate all the data for OpenGalaxy by month from 201501 to 202301 with continuous layout positions.

And I will upload the data to OSS under folders named 201501 - 202301, which means you can set ?v=201812 in URL to load the data of 201812. And set the default version to latest month, like 202301 for now.

Does this make sense to you? @tyn1998

tyn1998 commented 1 year ago

Is yyyy-mm a valid folder name and a valid url param? If so, I prefer yyyy-mm.

frank-zsy commented 1 year ago

OK, I think it is

frank-zsy commented 1 year ago

@tyn1998 Do you think you will put more effort in OpenGalaxy, I found it is really hard to give a continuous layout for 3d galaxy, is data in 2023-01 enough for now? I can not find a proper way to generate the data.

The parameters we should consider are:

So I think this will be a long term task to generate layouts for timeline.

tyn1998 commented 1 year ago

Hi, @frank-zsy, thanks a lot for your effort!

The knowledges and skills for generating continuous layouts for OpenGalaxy are indeed complex. I agree with you that more time and energy should be involved to complete the challenge.

For now, the data of 2023-01 is enough for building a demo application.

frank-zsy commented 1 year ago

@tyn1998 Thanks, I will look into the details in the future.

tyn1998 commented 1 year ago

Hi @frank-zsy, could you export OpenGalaxy data of 2023-02 and set it as the default data?

tyn1998 commented 11 months ago

Hi @frank-zsy, could you export OpenGalaxy data of 2023-02 and set it as the default data?

Hello @frank-zsy, could you export the data of 2023-09 and make it default?