filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application]Mooc #223

Closed Mooc-1 closed 1 year ago

Mooc-1 commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

As a popular Internet IT skill learning website in China, MOOC has been focusing on IT online education since its establishment in 2013. With the responsibility of cultivating practical talents for Internet companies, we invite technical experts from top companies to create cutting-edge IT technology boutiques. The courses empower every developer with dreams and aspirations in the world to realize their career dreams. Course coverage: 60 mainstream technical languages such as front-end \JAVA \Python \Go \artificial intelligence \big data\mobile terminal. It fully meets the actual needs of interview employment, career growth, self-improvement, etc., and help users achieve from skills upgrading to jobs.

What is the primary source of funding for this project?

From our company's account.

What other projects/ecosystem stakeholders is this project associated with?

No one.

Use-case details

Describe the data being stored onto Filecoin

We provided diffrent kinds of courses for users, so we need to put these courses data into filcoin.

Where was the data in this dataset sourced from?

It was from our course libraries. We collect contents and make courses attractively. Then we upload these contents. 

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

Yes.
[TypeScript.zip](https://github.com/filecoin-project/filecoin-plus-large-datasets/files/7957220/TypeScript.zip)

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Ok. We confirm that these courses are public data.

What is the expected retrieval frequency for this data?

About 1 time per year.

For how long do you plan to keep this dataset stored on Filecoin?

3 years.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

China and other countries in asia.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Yes. We'll use offline transfer when our network is poor.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Stable miners and they should have strong storage capability.

How will you be distributing deals across storage providers?

I will refer to the transmission distance and time cost to allocate.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have resource to finish it.
Destore2023 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebkpgskjwg3h7xsd7hhvgthdxsdwufb2aky6dw3feggyao2k7o5cu

Address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Datacap Allocated

800.00TiB

Signer Address

f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebkpgskjwg3h7xsd7hhvgthdxsdwufb2aky6dw3feggyao2k7o5cu

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Request number 11

Multisig Notary address

f01858410

Client address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

DataCap allocation requested

800TiB

large-datacap-requests[bot] commented 2 years ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Last two approvers

swatchliu & Alex11801

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

800TiB

Total DataCap granted for client so far

3.12PiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

1.87PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
94396 21 800TiB 17.66 196.54TiB
kernelogic commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedai7f5xdkjrktgi2ym2d5ip5wvbgd2vy2ewl4s7vdmbniaeahv24

Address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Datacap Allocated

800.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedai7f5xdkjrktgi2ym2d5ip5wvbgd2vy2ewl4s7vdmbniaeahv24

UnionLabs2020 commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec7gvoip775fjdz27ufttfdn4f2tvbit3k2zuv6toethotlw6faqa

Address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Datacap Allocated

800.00TiB

Signer Address

f17xdri3wunqgld7dm23e4f3eqsntjakwc47xjo6i

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec7gvoip775fjdz27ufttfdn4f2tvbit3k2zuv6toethotlw6faqa

UnionLabs2020 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebugcfokx5av36c4yxwnvt3fkkmmwykqtdnf6dfcyx3zpwufxhx3w

Address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Datacap Allocated

800.00TiB

Signer Address

f17xdri3wunqgld7dm23e4f3eqsntjakwc47xjo6i

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebugcfokx5av36c4yxwnvt3fkkmmwykqtdnf6dfcyx3zpwufxhx3w

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ 86.61% of total deal sealed by f01868890 are duplicate data.

⚠️ 22.93% of total deal sealed by f01896513 are duplicate data.

⚠️ 23.27% of total deal sealed by f01895990 are duplicate data.

⚠️ 30.74% of total deal sealed by f01889046 are duplicate data.

⚠️ 83.02% of total deal sealed by f01845667 are duplicate data.

⚠️ f01844999 has unknown IP location.

⚠️ 77.23% of total deal sealed by f01885280 are duplicate data.

⚠️ 78.80% of total deal sealed by f01880897 are duplicate data.

⚠️ 77.93% of total deal sealed by f01845679 are duplicate data.

⚠️ f01845679 has unknown IP location.

⚠️ 75.93% of total deal sealed by f01885260 are duplicate data.

⚠️ 82.46% of total deal sealed by f01830424 are duplicate data.

⚠️ f01830424 has unknown IP location.

⚠️ 77.94% of total deal sealed by f01880896 are duplicate data.

⚠️ 77.48% of total deal sealed by f01880894 are duplicate data.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01868890 Boardman, Oregon, US 526.56 TiB 9.49% 70.53 TiB 86.61%
f01896513 Singapore, Singapore, SG 403.94 TiB 7.28% 311.31 TiB 22.93%
f01895990 Singapore, Singapore, SG 396.50 TiB 7.15% 304.25 TiB 23.27%
f01832632 San Jose, California, US 280.78 TiB 5.06% 280.78 TiB 0.00%
f01889046 Ashburn, Virginia, US 270.50 TiB 4.88% 187.34 TiB 30.74%
f01845667 Hilliard, Ohio, US 267.75 TiB 4.83% 45.47 TiB 83.02%
f01895913 Singapore, Singapore, SG 239.31 TiB 4.31% 204.69 TiB 14.47%
f01844999new Unknown 185.91 TiB 3.35% 185.91 TiB 0.00%
f01832653 Seoul, Seoul, KR 169.72 TiB 3.06% 169.72 TiB 0.00%
f01828096 San Jose, California, US 157.63 TiB 2.84% 157.63 TiB 0.00%
f01807908 San Jose, California, US 156.03 TiB 2.81% 156.03 TiB 0.00%
f01780723 Boardman, Oregon, US 146.50 TiB 2.64% 146.50 TiB 0.00%
f01788210 Boardman, Oregon, US 135.16 TiB 2.44% 135.16 TiB 0.00%
f01788231 San Jose, California, US 120.81 TiB 2.18% 120.81 TiB 0.00%
f01778041 Hilliard, Ohio, US 113.00 TiB 2.04% 113.00 TiB 0.00%
f01788199 Boardman, Oregon, US 111.41 TiB 2.01% 111.41 TiB 0.00%
f01780720 Boardman, Oregon, US 111.06 TiB 2.00% 111.06 TiB 0.00%
f01788202 Boardman, Oregon, US 111.00 TiB 2.00% 111.00 TiB 0.00%
f01788206 Boardman, Oregon, US 110.13 TiB 1.98% 110.13 TiB 0.00%
f01885280 Hong Kong, Central and Western, HK 97.31 TiB 1.75% 22.16 TiB 77.23%
f01880897 Singapore, Singapore, SG 97.00 TiB 1.75% 20.56 TiB 78.80%
f01845679 Unknown 92.59 TiB 1.67% 20.44 TiB 77.93%
f01885260 Hong Kong, Central and Western, HK 87.38 TiB 1.57% 21.03 TiB 75.93%
f01769554 San Jose, California, US 83.00 TiB 1.50% 83.00 TiB 0.00%
f01844043 Hilliard, Ohio, US 79.72 TiB 1.44% 79.72 TiB 0.00%
f01844118 Hilliard, Ohio, US 77.25 TiB 1.39% 77.25 TiB 0.00%
f01843994 Hilliard, Ohio, US 75.72 TiB 1.36% 75.72 TiB 0.00%
f01844613 Hilliard, Ohio, US 75.19 TiB 1.36% 75.19 TiB 0.00%
f01830424new Unknown 65.11 TiB 1.17% 11.42 TiB 82.46%
f01844232 Hilliard, Ohio, US 65.00 TiB 1.17% 65.00 TiB 0.00%
f01843602 San Jose, California, US 56.00 TiB 1.01% 56.00 TiB 0.00%
f01843889 Hilliard, Ohio, US 56.00 TiB 1.01% 56.00 TiB 0.00%
f01843974 Hilliard, Ohio, US 56.00 TiB 1.01% 56.00 TiB 0.00%
f01843549 San Jose, California, US 54.97 TiB 0.99% 54.97 TiB 0.00%
f01843540 San Jose, California, US 54.97 TiB 0.99% 54.97 TiB 0.00%
f01889480 Boardman, Oregon, US 53.44 TiB 0.96% 53.44 TiB 0.00%
f01843673 Hilliard, Ohio, US 53.38 TiB 0.96% 53.38 TiB 0.00%
f01844329 Hilliard, Ohio, US 53.22 TiB 0.96% 53.22 TiB 0.00%
f01880896 Singapore, Singapore, SG 51.00 TiB 0.92% 11.25 TiB 77.94%
f01880894 Singapore, Singapore, SG 50.78 TiB 0.92% 11.44 TiB 77.48%
f01653777 Hilliard, Ohio, US 49.97 TiB 0.90% 49.97 TiB 0.00%
f01759262 San Jose, California, US 49.97 TiB 0.90% 49.97 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
3.69 PiB 4.73 PiB 1 87.29%
129.69 TiB 463.44 TiB 2 8.35%
35.06 TiB 241.72 TiB 3 4.36%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f1tivtdkq34mnfzso63wrr2xh6r2zo7aebao4jfla China travel service(HongKong) limited 824.66 TiB 8,631 LDN v3 multisig
f1ki4cjk27ypzjsuzkvvqzs6xobaweuoxfdcnpqhy TimeQuant 377.88 TiB 3,418 LDN v3 multisig
f1727w2vwjctfo7hflr5trgqkl3c7texh7pl4grzq GMverse 368.59 TiB 4,877 LDN v3 multisig
f14abwn2goturifmt27s2bssoe3fup2b3npkgfzui KONGFU VR 357.44 TiB 5,407 LDN v3 multisig
f1bb2z36lpq3pnwiiowiraagpzqnpow4bonjacx7a Hola Space 193.38 TiB 2,514 LDN v3 multisig
f1ffbtksiso5za73pypuxbxznng6rdv64nbihdtba EpikProtocol 171.31 TiB 2,336 LDN # 281
f1tgii4xgcp6um4n2w7eqaw6bwvnvyii2u7cjgzka Meson.Network 151.91 TiB 1,050 LDN v3 multisig
f1dx7ljly4vmk5x2bn5ng4jpbqnuvozrz2phw42qy CoinPhD 110.56 TiB 1,411 LDN v3 multisig
f1ivsvpljx4ovp2tth73pllkxmumrudemks623iia TIS Inc. group 101.06 TiB 1,304 LDN v3 multisig
f1ojo3wo24zkeovndlxfpqmlgzlbysg5fiukmup4a Titansoft 99.56 TiB 798 LDN v3 multisig
f1smfrznoqsp7eptqnselnosng7vxuo3ptnoqv2oa CTG Business Service 89.28 TiB 515 LDN v3 multisig
f1clwd2dooy2cflfilhzeq2ycl544b3heqsiwhjrq FiboAI 72.22 TiB 1,412 LDN v3 multisig
f1u522dt3zabkgbxxzecl77wgv473yvpecxh6t6zq Huifan livestream 61.84 TiB 966 LDN v3 multisig
f1fi7vzu266bh7yxsygw6scl2hjltdkhfb6nyzn3a Jiliantech 46.97 TiB 977 LDN # 190
f1kqhhvqxvtftur2qqsxowk5nhfp5bgt2wtwen6jy Sinso 44.63 TiB 285 LDN # 62
f1nmhjvibl7acp45lpxmo5kduz42evjazsyi4mzza ubuckvnockai 31.03 TiB 482 LDN v3 multisig
f1hik4fc3wvck2jchtufrnfnzmmskw2nyezlbsr3q NFTSCAN 20.03 TiB 237 LDN v3 multisig
f1qdycz5jgdcg4lerfpkujikkwcj7wq7yiqnmubni National library 13.22 TiB 221 LDN # 70
f17qd6x3leh5pa7vh6ewdaed7qhbn2mgofrokuayy Drust 10.09 TiB 125 LDN v3 multisig
f3q7ablez3jqkcjukwbzaql7lmbx4ldouu66nexpd
cfvu6kgho3v6gricckt77cgr46tdre2l4zmvha7bs
u7qq
MatrixStorage 2.75 TiB 10 LDN # 72
f1dpgqn57cl5wqijiyv3256nids2ox3fms2mg3oay All Blue 1.56 TiB 7 LDN v3 multisig
f1ffr7uhq7mszotfdybexcnl4i7xw5tqyp5fhqy3y MyTrade Technology Limited 896.00 GiB 9 LDN v3 multisig
f3woqxpu6ekmj43nmpcv7j2pgu6lejxtzgxpzl6f2
vrueoqlzjntakyhdkghymyffbzfbsio6dvfmy643x
4y7q
RICH ST PETE LLC 384.00 GiB 2 LDN v3 multisig
f1rnwp6p5kx7pcwexpucoav6mmwz3rkxw5blrboyy MigoVideo 224.00 GiB 2 LDN v3 multisig
f1vtcetpapwwyka6txwqjwymsa67npdfilbkwh64a Chimay 160.00 GiB 2 LDN v3 multisig
f1ajdk7e6ex7fvtjtydzgys47bcg7jub2baucun7y Ruikuyun Information Technology Co., Ltd 96.00 GiB 1 LDN v3 multisig
f1avhqgmeazxzvylz34l26o3nlj2ywnzaviwns3dq 87v5-CarsonVideo 96.00 GiB 2 LDN v3 multisig
f1einobkrjcjk6gfc5ov6663vrri75hwdsjfs6pmq Cansoti 64.00 GiB 1 LDN v3 multisig
f1muygjb5zowppbhr7wnmdreqdoiyyanfm7fe54dy MappingFunk Protocol 32.00 GiB 1 LDN v3 multisig

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

kernelogic commented 1 year ago

I gave my trust seeing iMooc is a well known entity hosting educational courses, but the CID sharing revealed the data is all fake. This is really heart-breaking to see.

herrehesse commented 1 year ago

@kernelogic we share the same feeling.

cryptowhizzard commented 1 year ago

Hey @kernelogic

We all have this. The thing is what we will learn from this. We are all human, we are all different. Some take more risk then others, some are on the defensive side but together we will work this out.

Most of the "non compliant" datastorage is now in the open. Let's try to unite and see if we can find a path forward in the benefit of Filecoin.

large-datacap-requests[bot] commented 1 year ago

The issue reached the total datacap requested. This should be closed

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1qjrlejxv5a73kxurwtxyrni6ji7orm74zeeetiy

Rule to calculate the allocation request amount

total dc reached

DataCap allocation requested

0

Total DataCap granted for client so far

7.275957614183433e+165YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

7.275957614183433e+165YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
128820 25 800TiB 13.09 187.89TiB
Aaronn85 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 12 storage providers sealed too much duplicate data - f01885280: 77.23%, f01885260: 75.93%, f01889046: 30.74%, f01845667: 83.02%, f01868890: 86.61%, f01896513: 22.93%, f01895990: 23.27%, f01880897: 78.80%, f01845679: 77.93%, f01830424: 82.46%, f01880896: 77.94%, f01880894: 77.48%

⚠️ 6 storage providers have unknown IP location - f02218611, f01927833, f02207907, f02182258, f01845679, f01830424

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.