Octachron / codept

Contextual Ocaml DEPendencies Tool: alternative ocaml dependency analyzer
Other
59 stars 10 forks source link

Use `cuts` instead of `split_on_char` #25

Closed dinosaure closed 4 months ago

dinosaure commented 1 year ago

This PR is a possible improvement about what we expect when we use the basic function split_on_char. From my point of view, String.split_on_char is problematic because:

For these two reasons, Support.cuts (took from the great astring):

The documentation explicits invariants too.

Octachron commented 1 year ago

I am surprised that astring is using the O( match_len * text_length) algorithm. Do you know why?

dinosaure commented 1 year ago

I am surprised that astring is using the O( match_len * text_length) algorithm. Do you know why?

I don't know details about that (/cc @dbuenzli) and, from my point of view, the algorithm seems naive but good enough for the purpose.

dbuenzli commented 1 year ago

I don't think there's any particular reason. On most uses of these kind of functions it seems good enough.

That being said since I have been brought here I will give my two unsolicited €cents :–)

I'm not sure I see the point of this PR, it seems String.split_on_char does the job in every case, use that it already belongs to the stdlib. Less code and no complexity objections. I suspect I won't see in my life-time a platform which has multibyte path separators. If you worry about these empties then most of the applications here go over the list right after so you can eliminate those in that pass and/or simply provide a support functions that filters the result of String.split_on_char.

dinosaure commented 1 year ago

I'm not sure I see the point of this PR, it seems String.split_on_char does the job in every case, use that it already belongs to the stdlib. Less code and no complexity objections. I suspect I won't see in my life-time a platform which has multibyte path separators. If you worry about these empties then most of the applications here go over the list right after so you can eliminate those in that pass and/or simply provide a support functions that filters the result of String.split_on_char.

Yes, to clarify the goal of this PR, it's just matter of readability which is particulary subjective depending on the reader. As you said, split_on_char does job but I let @Octachron to judge if it's better or not.

Octachron commented 4 months ago

I have pushed a commit 36bbd99fc88b which switch to the more complex version if the path separator is a full string (with room for optimisation if it is ever needed). I am not fond of the empty parameter that seems better handled by a separate function. I am thus closing this PR as completed, thanks for the suggestion!