dwyl / cid

❄️cid ("content id") is a human-friendly (readable/typeable) unique ID function built for distributed/decentralised systems.
GNU General Public License v2.0
34 stars 3 forks source link

Use ex_multihash instead of :crypto.hash(:sha512, input) #8

Open nelsonic opened 5 years ago

nelsonic commented 5 years ago

At present, the Erlang :crypto.hash(:sha512, input) is called directly when creating a cid: https://github.com/dwyl/cid/blob/3e8eb8267a7045910381dbe16bcce87f97922694/lib/cid.ex#L25

In light of the fact that https://github.com/ipld/cid achieves a pretty similar goal, I propose that we use multihash instead of directly creating a :sha512 hash. This will ensure forward compatibility and also mean that we can use the JS cid function on the client.

Todo

nelsonic commented 5 years ago

If you need more context on the IPFS/IPLD CID function, see: https://github.com/nelsonic/learn-ipfs/issues/8 and https://github.com/nelsonic/learn-ipfs/issues/5

RobStallion commented 5 years ago

@nelsonic I forked the repo and installed excoveralls to check the code coverage.

I know that they currently only have doctests but I figured if those doctests actually worked then all should be okay with them (unless I am misunderstanding why you might want separate tests)

image

COV    FILE                                        LINES RELEVANT   MISSED
 95.8% lib/multihash.ex                              261       24        1
  0.0% lib/util.ex                                    23        5        5
[TOTAL]  79.3%

There are no tests in the util file. There are only 2 function definitions however and one is a private function that is called by the other function. (What I am trying to say (UNLCEARLY) is that we would only need to test one function here)

There is only one line inmultihash.ex that is not being.

      do: encode(function, digest, length)

It is in the following function

  @spec encode(binary, binary, integer_default) :: on_encode
  def encode(<<_hash_code>> = hash_code, digest, length) when is_binary(digest) do
    with {:ok, function} <- get_hash_function(hash_code),
      do: encode(function, digest, length)
  end

I have called some of the functions in an iex session and have been able to replicate the results from the documentation. I haven't been able to test the function above though.

I can call it but I am not sure what argument to pass in to get past the following line

    with {:ok, function} <- get_hash_function(hash_code) do

as the get_hash_function function is keeps retuning an error for me.

With regards to this point

figure out how to use https://github.com/multiformats/ex_multihash to hash content.

I wouldn't say that I fully understand everything about hashing (right now) but the elixir code in this module is very well written and fairly straightforward to understand. (Plus the examples are pretty great)

nelsonic commented 5 years ago

@RobStallion doctests are only testing the signature of the functions. I ran the doctests locally too: image Warnings galore.

Did you push a commit to GitHub with your addition of excoveralls?

RobStallion commented 5 years ago

@nelsonic the doc tests do appear to be testing the functions

iex> Multihash.encode(:sha1, :crypto.hash(:sha, "Hello"))
{:ok, <<17, 20, 247, 255, 158, 139, 123, 178, 224, 155, 112, 147, 90, 93, 120, 94, 12, 197, 217, 208, 171, 240>>}

This calls the functions with the above arguments and expects the result below. These results can be replicated in your iex shell as well.

Also, if you change the returned binary in anyway the test fails.

The warnings all appear to be of the following... @doc attribute is always discarded for private functions/macros/types

The only reason this appears to be a thing is because the devs decided to put documentation on their private helper functions.

nelsonic commented 5 years ago

Yeah, Docs in private functions are a good thing. I don't know why Elixir considers them a warning. 🙄 I'm quite confident in the doctest for most of these functions, I think we should consider writing a test for Util.sum for completeness. Creating a PR improving the test coverage of a project is more of a test of the "responsiveness" of the repo/package/module maintainer. i.e. do they accept PRs that don't change the code but simply improve it's overall reliability?

RobStallion commented 5 years ago

@nelsonic Upon further inspection, the Multihas.Util/sum/2 function just ends up calling Multihash.encode/3 with a hash_type of :sha1, :sha2_256, :sha2_512.

This is probably the reason that they did not include the tests in the first place. (I should have noticed this on my first pass through 😞)

I can and have added tests for this now (in the same doc test format as the maintainers) and opened a pr here. Do you still want me to open a PR with them?

RobStallion commented 5 years ago

@nelsonic

figure out how to use https://github.com/multiformats/ex_multihash to hash content.

The code below is a modified version of the current Cid.make/2.

It can be tidied and the variables can be given better names (didn't think this was important for proof of concept) but it shows that we can use ex_multihash and still allow our tests to pass.

Original

  def make(input, length \\ 32) do
    hash = :crypto.hash(:sha512, input)

    hash
    |> Base.encode64()
    |> String.replace(~r/[Il0oO=\/\+]/, "", global: true)
    |> String.slice(0..(length - 1))
  end

With ex_multihash

  def make(input, length \\ 32) do
    hash1 = :crypto.hash(:sha512, input)
    {:ok, <<_multihash_code, _length, hash2::binary>>} = Multihash.encode(:sha2_512, hash1)

    hash2
    |> Base.encode64()
    |> String.replace(~r/[Il0oO=\/\+]/, "", global: true)
    |> String.slice(0..(length - 1))
  end

Multihash takes a hash and prepends info onto it.

<hash-func-type><digest-length><digest-value>

In the example above I am simply pattern matching the first 2 values, doing nothing with them and using the hash2 value in what was the original function call.

This is not how we would use it in production I'm sure, but it does show what we can use the multihash_code or the length variables for

nelsonic commented 5 years ago

@RobStallion this is a good start. thanks! 👍 What is your appetite for doing a "deep dive" into how the CID function works?

RobStallion commented 5 years ago

@nelsonic I am up for diving deep 💯, but just want to be clear on what you mean by CID function.

Do you mean that this example in the readme shows? Essentially create the functionality that the readme talks about?

nelsonic commented 5 years ago

@RobStallion using multihash is part of the puzzle but it's only the "four corners" of the puzzle. Take a look at https://proto.school https://github.com/nelsonic/learn-ipfs/issues/5

nelsonic commented 5 years ago

GOTO: #11