com-lihaoyi / os-lib

OS-Lib is a simple, flexible, high-performance Scala interface to common OS filesystem and subprocess APIs
Other
690 stars 71 forks source link

Make os.Path and os.RelPath implement CharSequence #247

Closed Flowdalic closed 8 months ago

Flowdalic commented 8 months ago

Having os.Path and os.RelPath, in fact os.FilePath, implement CharSequence reduces the boilerplate when using functions that take a input collection and produce some form of textual output.

Consider the following example

def printBashArray(out: Appendable, name: String, content: Seq[CharSequence]) =
  out.append(f"${name}=(\n")
  for c <- content
  do out.append(f"\t${c}\n")
  out.append(")\n")

and

val fooPaths: Seq[os.RelPath]

Currently we have to

printBashArray(out, "FOO_PATHS", fooPaths.map(_.toString))

after this change, we can simply

printBashArray(out, "FOO_PATHS", fooPaths)

(Of course, we could also declare 'content' as Seq[Any], but this blurres the intention.)

lefou commented 8 months ago

Currently we have to

printBashArray(out, "FOO_PATHS", fooPaths.map(_.toString))

after this change, we can simply

printBashArray(out, "FOO_PATHS", fooPaths.map(_.toString))

I can't spot a difference.

I don't think os.Path qualifies to IS-A Seq[CharSequence]. A os.Path represents a file system resources represented itself by a strings. Isn't path.segements what you need? It returns an Iterator[String].

Flowdalic commented 8 months ago

I can't spot a difference.

Sorry, copy&paste mistake. Now fixed.

I don't think os.Path qualifies to IS-A Seq[CharSequence].

This PR is about os.Path and os.RelPath to qualify as CharSequence.

I have made good experience with making specialized types1 that are essentially a sequence of characters (minus potentially additionally restrictions) implement the CharSequence interface. Having them implement the CharSequence interface reduces the boilerplate, see my adjusted example, and make the overall handling of them more natural.

1: Examples include MacAddress, EMailAddress and Jid.

lefou commented 8 months ago

I'm still not convinced. I even don't understand your motivation at all.

If you need a CharSequence out of a os.Path, why can't you just use toString, since every String is already a CharSequence?

printBashArray(out, "FOO_PATHS", fooPaths.toString)

A path is not a char sequence. E.g. charAt or subSequence makes no sense on a multi-segemnt path. If you need a String, use toString.

Flowdalic commented 8 months ago
printBashArray(out, "FOO_PATHS", fooPaths.toString)

I think you may have missed that fooPaths type is Seq[os.Path], therefore what you suggested will not work. Hopefully, the motivation is now clearer and you may re-evaluate the usefulness of this change and re-consider this PR.

lefou commented 8 months ago

If feeding Seq[os.Path] is a typical use case of your APIs, you could provide an implicit conversion.

implicit def pathToCharSeq(path: os.Path): CharSequence = path.toString
Flowdalic commented 8 months ago

If feeding Seq[os.Path] is a typical use case of your APIs,

Yes, I do it constantly. Again, if you have a dedicated type for something that is (typically) expressed as a String, like MacAddress or EMailAddress, then it is very sensible to have that type implement CharSequence, as you will often end up feeding them to an API that does some sort of output (either UI output or textual output of some sort, like an XML writer).

you could provide an implicit conversion.

Unfortunately, this will not help.

This is not a case where implicit or given Conversation will be applied. As soon as the to-be-converted type is the parameterized type of a generic class, as is the case here, the implicit conversation to the expected type will not be applied.

For example:

#!/usr/bin/env -S scala-cli shebang
//> using scala "3.3.1"
//> using jvm "system"
//> using dep "com.lihaoyi::os-lib:0.9.3"

object GivenConversion:
 given Conversion[os.FilePath, CharSequence] = (f: os.FilePath) => f.toString()

def myMkString(input: Seq[CharSequence]) = input.map(_.toString).mkString("- ", "\n- ", "\n")

val paths = Seq(
  os.Path("/foo"),
  os.Path("/bar"),
)

import GivenConversion.*
myMkString(paths)

Results in

[error] ./os-path-given-conversion.sc:19:12
[error] Found:    (os$minuspath$minusgiven$minusconversion$_.this.paths : Seq[os.Path])
[error] Required: Seq[CharSequence]
[error] myMkString(paths)
[error]            ^^^^^

I wonder where this aversion against subclassing CharSequence stems from besides not understanding the issue doing so fixes. I have not heard any downsides to os.Path implementing the CharSequence interface.

Do you want os-lib users to deal with additional friction when using the os-lib API although it could be avoided?

SethTisue commented 8 months ago

fwiw, I agree with @lefou — tons of data types have a string representation that is relatively frequently needed, but that doesn't mean they should all implement CharSequence. the purpose of the CharSequence interface, as I understand it, is to capture the commonality between domain-independent String-like things like StringBuilder and StringBuffer — rather than to be a sort of poor man’s implicit conversion from domain-specific types to String

SethTisue commented 8 months ago

Note that the Java standard library designers seem to agree as well:

scala 2.13.12> java.nio.file.Path.of("/tmp")
val res1: java.nio.file.Path = /tmp

scala 2.13.12> res1: java.lang.CharSequence
               ^
               error: type mismatch;
                found   : java.nio.file.Path
                required: CharSequence
lefou commented 8 months ago

@Flowdalic ,

Trying to help you with your use case, I think your example should work with

implicit def pathToCharSeq(paths: Seq[os.Path]): Seq[CharSequence] = paths.map(_.toString)

I wonder where this aversion against subclassing CharSequence stems from besides not understanding the issue doing so fixes.

Wow. "Aversion". I think you misunderstood something. And I don't like the tone.

I have not heard any downsides to os.Path implementing the CharSequence interface.

I see it as @SethTisue. A domain type using a string internally, isn't necessarily a string itself. Also, I wrote before:

A path is not a char sequence. E.g. charAt or subSequence makes no sense on a multi-segemnt path. If you need a String, use toString.

Doesn't that qualify as at least two downsides?

Do you want os-lib users to deal with additional friction when using the os-lib API although it could be avoided?

Absolutely not. I simply don't agree this is an issue that produces "friction".

Flowdalic commented 8 months ago

Wow. "Aversion". I think you misunderstood something. And I don't like the tone.

I am sorry, that was not intended to be in any way an attack. Maybe my understanding what 'aversion' expresses is not correct.

A path is not a char sequence. E.g. charAt or subSequence makes no sense on a multi-segemnt path. If you need a String, use toString.

Doesn't that qualify as at least two downsides?

That is something I do not understand: Assume val myPath: os.Path, then why should myPath.subSequence(3, 7) not be sensible, while myPath.toString().subSequence(3, 7) clearly is?

Note that the Java standard library designers seem to agree as well:

I like the Java standard library very much. I think it is one of the best designed standard libraries out there. That said, it has its warts and this is one of those.

I leave at it that and with saying again that I am sorry if my reply came across as impolite. That wasn't my intention, I just wanted to learn about the arguments against this change.

lefou commented 8 months ago

@Flowdalic, since you asked, let me add a last comment to this PR.

why should myPath.subSequence(3, 7) not be sensible, while myPath.toString().subSequence(3, 7) clearly is?

The latter case is just some known string API on a string, and we all know strings are arrays or sequences of characters. Since its concept is that old, it's rather "intuitive" API.

But in the former case, we're dealing with a path. So what could subSequence(3,7) on a path mean? Since a path has segments, maybe it could mean the sub-path starting at the third segment?

So, since a CharSequence is just a Java-esque of Seq[Char], but a path is already a Seq[PathSegment] (at least conceptually, in fact, segments is a Seq[String]), how can a type be a Seq of a path segment (String) and a single character (Char) at the same time? Once we spot discrepancies like that in an API, we should rather avoid it.

I understand the desire to automagically support your favorite domain types everywhere. But implementing a CharSequence in os.Path isn't the right choice. It may very well be a legit compromise in other languages, but not in Scala as I know it.

To pick up your other example, the XML writer; in Scala, the preferred or typical way to support different domain type to be written as XML is the concept of type classes.

Flowdalic commented 8 months ago

Thanks for the example. That provides some insight about why you reject this change. However the rationale

But in the former case, we're dealing with a path. So what could subSequence(3,7) on a path mean? Since a path has segments, maybe it could mean the sub-path starting at the third segment

i.e., that the behavior of subSequence ()is ambiguous Is flawed, because CharSequence defines its methods based on the return value of its toString() method.

The toString() method is defined as returning "... a string containing the characters in this sequence in the same order as this sequence." And subSequence(start, end) is defined as a subsequence of this sequence starting with the char value at the specified index (the end index is similarly defined).

If you return anything else than the string from the 3 to 7th (exclusive) char, the you would violate the contract of CharSequence.

There is really no ambiguity introduced by this PR about what os.Path.subSequence(3, 7) would return.