filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Boom Adverisng Media Co. Ltd. #852

Closed thrbowl closed 1 year ago

thrbowl commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Beijing Boom Adverisng Media Co., Ltd. was established in 2019.

Customers covering e-commerce, games, finance and other industries

The company's business scope: design, production, agency, advertising; Graphic design; Logo design; Computer graphic design and production; Packaging design; Enterprise planning; Technology promotion services; Brand planning; Organizing cultural and artistic exchange activities; To undertake exhibitions and exhibitions; Conference services; Computer animation design; Technical consultation, technical promotion.

What is the primary source of funding for this project?

Income of the company and investment of shareholders.

What other projects/ecosystem stakeholders is this project associated with?

No other projects/ecosystem stakeholders.

Use-case details

Describe the data being stored onto Filecoin

Around 1.5 billion ad requests per month.

Public website data, product content created in about 4 years.Including but not limited to a large number of user information,image, video, audio, artwork, design, photography and other types of data related to advertising.

Where was the data in this dataset sourced from?

Mainly data are made by ourselves, some of them are synced to us by customers.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

Here are some data samples. For privacy reasons, user data is not provided directly here.

Our data includes
  1 User clicks log
  2 Advertising analysis logs
  3 Advertising big data recommendation of each country
  Because some of the materials involve customer privacy, it is inconvenient to show them in detail here. This is a screenshot of the amount of material displayed at part of the time. We confirm that a lot of storage space is required.

https://user-images.githubusercontent.com/109211726/183399725-a74b50ff-59e0-4ec6-a637-be4a791aee0d.png
http://www.boom-ad.cn/private/datareport2022.png
https://www.boom-ad.cn/dataview/data.zip
But we will not store private data.
For contractual and security reasons, customer-related data will only be stored in our own data center.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, We confirm that the data is public and it can be retrieved by anybody. 

What is the expected retrieval frequency for this data?

Because making new products requires access to previous data, we want the data to be retrieved as often as possible.

For how long do you plan to keep this dataset stored on Filecoin?

We hope at least 2 years.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Greater China

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Online transfer and offline copy.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will choose stable and experienced storage providers to store different data in case all data is damaged or lost.

If the application is approved, we will look for SP based on subsequent needs and geographic location.

Currently all of our data is stored in AWS S3 and is archived automatically for 180 days.

Our storage needs to record all the details and keep the data for at least 10 years.
1 Snapshot archive (including videos) after the advertisement is released, which will exceed 100T every month.
2 User access logs, about 15-20T of compressed logs are generated every month.
3 The logs of the advertising analysis system generate 10-20T data every month.
4 Advertising big data recommendation, generating a total of 15-20T archived data every month.

At present, the demand of 1-1.2P needs to be generated every year, and the annual increase is about 35-50%.

If DataCap can provide online video playback and can be automatically archived, we can also put the video storage part on DataCap. The current data stock in this part is about 1.2-1.5P. And it's rising every year.

How will you be distributing deals across storage providers?

Less than 25% data for every storage providers. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Of course, We have made sufficient preparations. 
stcloudlisa commented 1 year ago

The quota allocation of each SP is reasonable, and the maximum does not exceed 21% I randomly checked the data to check, indeed, it is the data related to the company that applied. So far, this is an honest customer.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 9

Multisig Notary address

f02049625

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

DataCap allocation requested

800TiB

Id

d0a40676-c153-4751-a3f5-e25682a12e0f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

Last two approvers

llifezou & 1LISA2

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

800TiB

Total DataCap granted for client so far

3.85PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

1.14PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
75764 11 800TiB 19.72 181.43TiB
NDLABS-Leo commented 1 year ago

Top Allocation :16% I am willing to support and continue to follow.

NDLABS-Leo commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebdmt5fivpvcxrrjcr63zetj4l3agdt4cljdlxyjuqoramjfd6eee

Address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

Datacap Allocated

800.00TiB

Signer Address

f1yayfsv6whu3rheviucvventj3y6t542xfpb47ei

Id

d0a40676-c153-4751-a3f5-e25682a12e0f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebdmt5fivpvcxrrjcr63zetj4l3agdt4cljdlxyjuqoramjfd6eee

newwebgroup commented 1 year ago

Check items

  1. Distribution proportion of SP Top Allocation :16%+
  2. Location distribution of SPs Located in four regions, Busan, South Korea, HK, the United States and Singapore
newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec7htpaama6ktq5vls2mhyxctf4yhbrse7hm4fjftjlolm44d3txe

Address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

Datacap Allocated

800.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

d0a40676-c153-4751-a3f5-e25682a12e0f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec7htpaama6ktq5vls2mhyxctf4yhbrse7hm4fjftjlolm44d3txe

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01938721 Hong Kong, Central and Western, HK 476.94 TiB 12.35% 476.88 TiB 0.01%
f01938718 Morrisville, North Carolina, US 472.25 TiB 12.23% 472.22 TiB 0.01%
f01938717 Singapore, Singapore, SG 470.78 TiB 12.19% 470.78 TiB 0.00%
f01938665 Sham Shui Po, Sham Shui Po, HK 469.03 TiB 12.15% 469.00 TiB 0.01%
f01938714 Sham Shui Po, Sham Shui Po, HK 466.31 TiB 12.08% 466.31 TiB 0.00%
f01852023 Busan, Busan, KR 334.50 TiB 8.66% 334.50 TiB 0.00%
f01852325 Hong Kong, Central and Western, HK 329.41 TiB 8.53% 329.41 TiB 0.00%
f01852677 Morrisville, North Carolina, US 322.81 TiB 8.36% 322.81 TiB 0.00%
f01852664 Singapore, Singapore, SG 304.59 TiB 7.89% 304.59 TiB 0.00%
f01851482 Busan, Busan, KR 213.78 TiB 5.54% 213.78 TiB 0.00%
f01969202new Sham Shui Po, Sham Shui Po, HK 192.00 GiB 0.00% 192.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
384.00 GiB 384.00 GiB 1 0.01%
4.59 TiB 9.19 TiB 2 0.24%
52.72 TiB 158.16 TiB 3 4.10%
269.53 TiB 1.05 PiB 4 27.93%
522.72 TiB 2.55 PiB 5 67.70%
96.00 GiB 864.00 GiB 8 0.02%
32.00 GiB 320.00 GiB 9 0.01%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 10

Multisig Notary address

f02049625

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

DataCap allocation requested

370TiB

Id

9e0784c7-27b5-436b-a7e4-240dce721cf7

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

Last two approvers

newwebgroup & not found

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

370TiB

Total DataCap granted for client so far

4.63PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

370TiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
143619 15 800TiB 10.4 196.84TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01938721 Hong Kong, Central and Western, HK
BIH-Global Internet Harbor
476.94 TiB 10.71% 476.88 TiB 0.01%
f01852325 Hong Kong, Central and Western, HK
BIH-Global Internet Harbor
385.16 TiB 8.65% 385.13 TiB 0.01%
f01938665 Sham Shui Po, Sham Shui Po, HK
China Unicom Global
469.03 TiB 10.53% 469.00 TiB 0.01%
f01938714 Sham Shui Po, Sham Shui Po, HK
China Unicom Global
466.31 TiB 10.47% 466.31 TiB 0.00%
f01852023 Busan, Busan, KR
Korea Telecom
391.44 TiB 8.79% 391.44 TiB 0.00%
f01851482 Busan, Busan, KR
Korea Telecom
271.59 TiB 6.10% 271.59 TiB 0.00%
f01938717 Singapore, Singapore, SG
StarHub Ltd
470.78 TiB 10.57% 470.78 TiB 0.00%
f01852664 Singapore, Singapore, SG
StarHub Ltd
363.41 TiB 8.16% 363.41 TiB 0.00%
f01938718 Morrisville, North Carolina, US
TierPoint, LLC
472.25 TiB 10.61% 472.22 TiB 0.01%
f01852677 Morrisville, North Carolina, US
TierPoint, LLC
379.25 TiB 8.52% 379.25 TiB 0.00%
f01969202new London, England, GB
Zenlayer Inc
63.72 TiB 1.43% 63.72 TiB 0.00%
f01964073 Jakarta, Jakarta, ID
Zenlayer Inc
62.50 TiB 1.40% 62.50 TiB 0.00%
f01966534 Bangkok, Bangkok, TH
Zenlayer Inc
61.47 TiB 1.38% 61.47 TiB 0.00%
f01964002 Kuala Lumpur, Kuala Lumpur, MY
Zenlayer Inc
60.38 TiB 1.36% 60.38 TiB 0.00%
f01965334 Mumbai, Maharashtra, IN
Zenlayer Inc
58.47 TiB 1.31% 58.47 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
6.56 TiB 6.56 TiB 1 0.15%
9.41 TiB 18.81 TiB 2 0.42%
31.06 TiB 93.19 TiB 3 2.09%
127.66 TiB 510.63 TiB 4 11.47%
764.41 TiB 3.73 PiB 5 85.84%
32.00 GiB 224.00 GiB 7 0.00%
32.00 GiB 320.00 GiB 8 0.01%
96.00 GiB 960.00 GiB 9 0.02%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Greetings,

I have been doing retrievals on this application but i could not do any except for the demo data. The rest of the deals are inaccessible due the SP's timing out or not open.

f01948527.log

Can you also tell me why f0155049-14 is involved? I don't understand the relationship.

Scherm­afbeelding 2023-01-14 om 16 49 51 Scherm­afbeelding 2023-01-14 om 16 49 39 Scherm­afbeelding 2023-01-14 om 16 48 29
cryptowhizzard commented 1 year ago

I am also interested in the relationship between what you store above and say to be storing here:

Our data includes 1 User clicks log 2 Advertising analysis logs 3 Advertising big data recommendation of each country Because some of the materials involve customer privacy, it is inconvenient to show them in detail here. This is a screenshot of the amount of material displayed at part of the time. We confirm that a lot of storage space is required.

https://user-images.githubusercontent.com/109211726/183399725-a74b50ff-59e0-4ec6-a637-be4a791aee0d.png http://www.boom-ad.cn/private/datareport2022.png https://www.boom-ad.cn/dataview/data.zip But we will not store private data. For contractual and security reasons, customer-related data will only be stored in our own data center.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Organization Name field in the information provided We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Organization Name field in the information provided We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 10

Multisig Notary address

f02049625

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

DataCap allocation requested

370TiB

Id

fed5cfbd-8ca0-4e26-8afd-5cdaa991aaad

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1ucw52kv3o7ztmikgmc5cdeq6msr5vdmnmeb3c5a

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

370TiB

Total DataCap granted for client so far

7.275957614183433e+109YiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

-8.79B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
150842 15 800TiB 9.91 0B
C00kies77 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!