filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application]HELIOS #305

Closed HELIOSSHANGHAI closed 1 year ago

HELIOSSHANGHAI commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

HELIOS is a business travel and expense management software developed by Shanghai Zhenhui Information and Technology Co., Ltd. The company is owned by industry-leading IT consulting firm Hand (stock code: 300170).
Since its debut in 2016, HELIOS is dedicated to providing customers with an advanced management concept to more organizations to run their operations more productively and efficiently. It has processed over 5 million receipts and handled over 10 billion Yuan in reimbursement. With close to one million monthly active users, it is dominating the SaaS market in China. Huilianyi is dedicated to providing more customers with advanced management concept in order to help them run more efficiently.

What is the primary source of funding for this project?

From company's income.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

These are masked data from our system. Approximate 1PiB.

Where was the data in this dataset sourced from?

Masked enterprise data set from our customers, involving Hotel, transportation, manufacturing and other industries.

  public static String maskSensitiveData(String str, int headCharCount, int tailCharCount) {
        if(str.length()<headCharCount+tailCharCount){
            throw new IllegalArgumentException("明文过短,无法脱敏");
        }
        String repeat = "";

        int len = str.length() - headCharCount - tailCharCount;
        if (len > 0) {
            char[] buf = new char[len];
            AtomicInteger integer = new AtomicInteger(0);
            Arrays.asList(new Integer[len]).stream().forEach(b -> buf[integer.getAndIncrement()] = '*');
            repeat = new String(buf);
        }
        return str.substring(0, headCharCount) + repeat + str.substring(str.length() - tailCharCount);
    }

    public static void main(String[] args) {
        System.out.println(maskSensitiveData("120115201406180712", 6, 4));
        System.out.println(maskSensitiveData("9558820200019833888", 6, 4));
        System.out.println(maskSensitiveData("18810754438", 3, 2));
    }

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

[sqlResult_3175590.csv](https://github.com/filecoin-project/filecoin-plus-large-datasets/files/8804253/sqlResult_3175590.csv)

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

We confirm that our dataset is public.

What is the expected retrieval frequency for this data?

2-3 times a year. 

For how long do you plan to keep this dataset stored on Filecoin?

540 days or more.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

China Only

How will you be distributing your data to storage providers? Is there an offline data transfer process?

We've prepared some hard-disks for offline transfer. For storage providers at a distance, we'll distribute data online.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We've contacted some Asia storage providers on slack and telegram, and they need datacap to give us a low price. If PL can find more storage providers for us, we are happy to cooperate with them. We may have more than 10 nodes to make deals.

How will you be distributing deals across storage providers?

Less than 15% data for every storage providers. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes. We are ready to make deals. 
We also have lots of non-public data. If filecoin can accept non-public data, we are happy to continue to make more deals in the future.