filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] <Jiajia tech> - <AI warehouse management> #1148

Closed whymichaelgu1 closed 1 year ago

whymichaelgu1 commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Jiajia technology was founded in 2015, through AI, blockchain and IOT (internet of things) technology, Jiajia tech is committed to provide cutting-edge technology to benefit commodity warehouse management industry. The project: AI-blockchain warehouse management system is big part of the commodity-to-digital-asset program which is initiated by the department of warehouse and logistics in China.
In the commodity industry, various companies such as commodity owners, warehouses, banks, commodity exchange, third-party technology company such as Jiajia tech has joined one Consortium blockchain to sell commodity or secure collateral based loans the way they never had before. Through this Consortium blockchain, commodity is viewed as blockchain warehouse receipt. Jiajia AI-blockchain warehouse management system uses massive video surveillance data to develop AI algorithm to detect movement of the commodity in the warehouse. If the bank has granted a loan for certain commodity, and if an unauthorized movement of the commodity has been detected by the system, it will automatically alarm all the relevant parties, especially the bank. In this case, the accuracy of the AI algorithm is critical. The pain point is: the warehouses do not have the capacity nor the budget to store video surveillance for more than 10 days. That is not enough data for AI machine learning, Furthermore, the banks need warehouse surveillance for much more than 10 days for its own security reason to grant any loans. In other words, the further back the video surveillance is stored, the more likely the bank will offer loans to the warehouse. So Jiajia’s AI-blockchain warehouse management system is looking for more cost-effective ways to store these valuable video surveillance data.
Jiajia’s AI-blockchain warehouse management system has been awarded many prizes by mainstream media among competitive blockchain-based industry-focusing applications. To name a few:
https://www.sohu.com/a/516309284_120610368
https://news.smm.cn/news/101433777
https://finance.qq.com/a/20200320/014078.htm
https://www.sohu.com/a/459136581_120610368 https://baijiahao.baidu.com/s?id=1676257629917146695&wfr=spider&for=pc
https://www.niuxuan.cn/redian/20521.html
I am the CTO of Jiajia tech, I firmly believe filecoin’s cost-effective and decentralized storage network will benefit the whole commodity industry for the long run.
https://www.sohu.com/na/444675307_120610368
http://m.chinawuliu.com.cn/zixun/202006/15/508273.shtml

What is the primary source of funding for this project?

Jiajia tech

What other projects/ecosystem stakeholders is this project associated with?

none

Use-case details

Describe the data being stored onto Filecoin

Mostly video surveillance data along with some AI training datasets including videos  and pics

Where was the data in this dataset sourced from?

our own servers

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/drive/folders/1dX0CGisAda_jkXwwRdpVzH3mYY0SiyPx?usp=sharing
https://drive.google.com/drive/folders/1-eFRXA5P2IrksUx4n1LDL4Qf6uXdayGi?usp=sharing

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, I confirm this is a public dataset that can be retrieved by anyone on the network with no specific permissions or access rights required.

What is the expected retrieval frequency for this data?

at least once a month

For how long do you plan to keep this dataset stored on Filecoin?

2 years

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Asia 

How will you be distributing your data to storage providers? Is there an offline data transfer process?

mostly offline, some online 

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I plan to find  SPs with good reputation in Filecoin slack

How will you be distributing deals across storage providers?

I plan to find 6 SPs and distribute the deals equally.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

No additional support needed right now. Will reach out if I run into problems
Sunnyiscoming commented 1 year ago

Warehouse surveillance is not suitable for storing as public dataset. The data samples you provided is not enough to prove you have 5 PB data storage needs. Can you explain your data composition and provide sufficient data samples separately? How much original data do you have? How many copies will you store? Can you provide more detailed information about other storage providers participated in this program, such as you can list SPs you have contacted with at present?

whymichaelgu1 commented 1 year ago

@Sunnyiscoming Thank you for your feedback. Those massive surveillance video data is not just for warehouses, financial institutions, but also as raw data for AI machine learning dataset. The data intended to store on the filecoin network is composed of original surveillance video and some AI machine learning video and pic annotation. I have provided more original surveillance video sample in the link: https://drive.google.com/drive/folders/1dX0CGisAda_jkXwwRdpVzH3mYY0SiyPx?usp=share_link Those warehouses are all ALOT-enabled warehouses which means there are at least 40 cameras to cover every corner of each facility. And there are a lot ALOT-enabled nonferrous warehouses need to store those data. The original data is over 5P. We intend to store 4 copies ( maybe 3 in the initial phases), as of right now, I have already contacted: f01736786 (a China-based node), Linkspeed (a USA-based SP), Holon( an Australia-based SP), and they agreed to provide storage. I will reach out more SPs through the filecoin slack channel to have more SPs.
Let me know if you need more infromation, thanks.

Sunnyiscoming commented 1 year ago

More data samples are needed.

whymichaelgu1 commented 1 year ago

@Sunnyiscoming more sample data has been updated in the link: https://drive.google.com/drive/folders/1dX0CGisAda_jkXwwRdpVzH3mYY0SiyPx?usp=share_link

Sunnyiscoming commented 1 year ago

Reconfirm the original data size. Each video is 50-100M in size.

whymichaelgu1 commented 1 year ago

@Sunnyiscoming the surveillence video came off from NVR, each camera continously records the scene. 50-100m might cover 30 mins to 1 hour of the recording.

whymichaelgu1 commented 1 year ago

@Sunnyiscoming I have been waiting for more than 40 days, is this still in progress?

cryptowhizzard commented 1 year ago

Dear applicant,

Thank you for applying for datacap. As Filecoin FIL+ notary i am screening your application and conducting due diligence.

Looking at your application i have some questions: As you are brand new on Github and have no history of past applications it seems to me that applying for 5PB of datacap is a lot. One needs comprehensive knowledge of Filecoin, packing of data, distribution of data and all it's requirements coming with it. Are you brand new in the Filecoin space or have you applied for datacap in the past on different Github account names?

Can you show us visible proof of the size of your data and the storage systems you have there?

As last question i would like you to fill out this form to provide us with the necessary information to make a educated decision on your LDN request if we would like to support it.

Thanks!

whymichaelgu1 commented 1 year ago

@cryptowhizzard @Sunnyiscoming @raghavrmadya @Kevin-FF-USA I was a IPFS fan and recommended to the filecoin FIL+ program. Regardless of how the applicaiton goes, one can not help noticing how this program has become. I don't mind @cryptowhizzard ask questions, the tougher the better, I just hope the notary community has the decency and courtesy to ask those questions 40 days earlier, especailly @Sunnyiscoming, I mean, @Sunnyiscoming are you that ineffeicent for all applications, I hope not, I really hope @raghavrmadya @Kevin-FF-USA @cryptowhizzard can do some DD on notaries as well, it has become a joke, a place to trade datacap for profits. I doubt @raghavrmadya @Kevin-FF-USA don't know that, please check those applicaions that approved with light speed and those which left unattended for monthes. Good luck to Filecoin,, but if this system keeps this of corruption, I don't think Filecoin can go anywhere, notary system is not a centralized system, it is a corrupted system, which is quite a shame to the web3.0 world

Sunnyiscoming commented 1 year ago

Due to the large number of datasets, some applications were submerged. Sorry for the delay. You can ask notaries do direct due diligence.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

150TiB

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

DataCap allocation requested

75TiB

Id

34960cfe-6ccb-4df6-880f-a23ca2a60b4c

cryptowhizzard commented 1 year ago

Hi @whymichaelgu1

I want to get you going and appreciate you want to try Filecoin. Sorry it took so long.

Can you please fill out the KYC so i know who you are? I will try to get you moving on asap.

whymichaelgu1 commented 1 year ago

@cryptowhizzard KYC submitted

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebqqjs7njf7njejxclnnfbnrtb7yl2scfspc4njxuyk3ilzjteyyc

Address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Datacap Allocated

75.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

34960cfe-6ccb-4df6-880f-a23ca2a60b4c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebqqjs7njf7njejxclnnfbnrtb7yl2scfspc4njxuyk3ilzjteyyc

whymichaelgu1 commented 1 year ago

@Sunnyiscoming can you sign this LDN please, thank you

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebdkc35k2aymzflgsp534nysuwc4mzpnjccbbse67lsuevyms2ezi

Address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Datacap Allocated

75.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

34960cfe-6ccb-4df6-880f-a23ca2a60b4c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebdkc35k2aymzflgsp534nysuwc4mzpnjccbbse67lsuevyms2ezi

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

DataCap allocation requested

150TiB

Id

0acd36ec-fe0b-4593-ab64-2116cda9de2c

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

150TiB

Total DataCap granted for client so far

75TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.92PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
1382 3 75TiB 33.94 18.03TiB
woshidama323 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

woshidama323 commented 1 year ago

can you explain why there are 800GiB sharing reports?

yaoyuanww commented 1 year ago

@woshidama323 for testing purpose

woshidama323 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceczkmip63d2xjrm72eq54yuioiwt4fdj5llodohv4pxb3umcxpdri

Address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Datacap Allocated

150.00TiB

Signer Address

f12tk3adljauwnd3hjbigpfxb7b7gdlj63p6afwtq

Id

0acd36ec-fe0b-4593-ab64-2116cda9de2c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceczkmip63d2xjrm72eq54yuioiwt4fdj5llodohv4pxb3umcxpdri

Aaron01230 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

Aaron01230 commented 1 year ago

Why is the retrieval rate so low?

yaoyuanww commented 1 year ago

I asked sp, they did store the unsealed copy, they just did not turn on the retriveal since it takes up bandwidth and they have limited bandwidth, they will turn them on from now on.

Aaron01230 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced2xlsxa3dcn62xsoctop6yfhfzh4xcznqdez5detioellm33sq4q

Address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Datacap Allocated

150.00TiB

Signer Address

f1xrnysd4gimg64d4l6qi7ulzwwq22c6vfg6lpw3i

Id

0acd36ec-fe0b-4593-ab64-2116cda9de2c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced2xlsxa3dcn62xsoctop6yfhfzh4xcznqdez5detioellm33sq4q

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

@yaoyuanww

Nothing changed. How long should we wait for your promise?

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

Aaron01230 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

yaoyuanww commented 1 year ago

@Kevin-FF-USA the bot did not trigger the next round, can you take a look, thanks

yaoyuanww commented 1 year ago

@kevzak the bot did not trigger the next round, can you take a look, thanks

kevzak commented 1 year ago

@yaoyuanww are you using the same address somewhere else? sometimes it causes issues

kevzak commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

yaoyuanww commented 1 year ago

@kevzak no, this address is just for this LDN

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

DataCap allocation requested

300TiB

Id

9370acf4-00f6-4dc1-b637-1c786fabf01f

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1xppiiufu6rn22zi6d6gq2ivc23updomtl7ainbq

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

300TiB

Total DataCap granted for client so far

13642.4YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

13642.4YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
3143 6 150TiB 35.99 0B
Destore2023 commented 1 year ago

Hi @whymichaelgu1 Thank you for reaching me for your LDN allocation.

How long does this video surveillance will store? Do you think that's necessary?

yaoyuanww commented 1 year ago

@Destore2023 It used to be one months since the warehouses do not have the budget or capacity to store them more than that. with Filecoin, those data can be stored much longer and it is absolutely necessary since the financial institutions who do financing on those commodities need longer video surveillnace, the longer the better. For the commodities owner, if the warehouse offers financing service, it is a huge plus. So, in short, it is necessary.

Destore2023 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 71.19% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

Good morning,

Seems your retrieval is not working correctly. Can you please fix before we move to next allocation.

cat 1148-f01512680-f02230977-47111204-baga6ea4seaqcd4pexulcmdb64b3nfgwz363qbv7k2xhddbccgybhgxboa77esfi.log ERROR: offer error: retrieval query offer errored: failed to fetch piece to retrieve from: getting pieces for cid bafykbzaceatsbvcz5ur5ee7mvks6ctmdm5l5mqfjyuvx4yimw6b6qknsle6wy: getting pieces containing block bafykbzaceatsbvcz5ur5ee7mvks6ctmdm5l5mqfjyuvx4yimw6b6qknsle6wy: failed to lookup index for mh a0e402202720d459ed23d213ecaaa5e14d836757d640a9c52b7e610cb783e829b2593d6c, err: datastore: key not found

Destore2023 commented 1 year ago

Good morning,

Seems your retrieval is not working correctly. Can you please fix before we move to next allocation.

cat 1148-f01512680-f02230977-47111204-baga6ea4seaqcd4pexulcmdb64b3nfgwz363qbv7k2xhddbccgybhgxboa77esfi.log ERROR: offer error: retrieval query offer errored: failed to fetch piece to retrieve from: getting pieces for cid bafykbzaceatsbvcz5ur5ee7mvks6ctmdm5l5mqfjyuvx4yimw6b6qknsle6wy: getting pieces containing block bafykbzaceatsbvcz5ur5ee7mvks6ctmdm5l5mqfjyuvx4yimw6b6qknsle6wy: failed to lookup index for mh a0e402202720d459ed23d213ecaaa5e14d836757d640a9c52b7e610cb783e829b2593d6c, err: datastore: key not found

Hi @whymichaelgu1 Based on the fact that other notaries questioned your retrieval problem, can you improve it? If so, I'm willing to sign.

yaoyuanww commented 1 year ago

we did check all the sps and found this error came from f02230977. They do support retrieval, but what they did is that they changed boost code to increase efficiency, in which case it does not allow retrieval to perform while sealing, and this sp probably won't change that back. So we will stop working with the specific miner f0223097. We checked other miners, they supported retrieval unconditionally. attached is one example as of f02231025

edb990bc2d629e4391493cee8fe4812