Raynes / fs

File system utilities for Clojure.
453 stars 119 forks source link

Audit everything and make sure all behavior is as consistent as possible #47

Open Raynes opened 11 years ago

Raynes commented 11 years ago

Right now behavior of some functions is inconsistent with others. I made the decision when I took over the project that I would, whenever possible, return File objects instead of paths. The purpose of this was to prevent unnecessary creation of file objects just to pull the string paths out and then recreate more File objects later.

The problem is that this doesn't make sense for some functions and they are inconsistent. Some functions it does make sense for don't do it and are also inconsistent. I should carefully audit the library and make behavior consistent. This should, if at all possible, be the last breaking API change before all of fs's users vow to murder my first born.

Comments welcome.

zsau commented 10 years ago

I would prefer everything return string-based paths, because it avoids unnecessary exposure of Java objects and permits consistency with functions for which it doesn't make sense to return a File.

Anecdotally, when I use fs I more often want to manipulate paths as strings than call File methods. And I don't manipulate paths in tight loops, so the performance overhead of creating File wrappers for each operation would be negligible for me.

Raynes commented 10 years ago

But what's wrong with exposure to File objects? They're perfectly acceptable representations of files. They are, in fact, objects designed specifically for representing files. You can get a string from them by calling str, you can make them by calling (file ..) on strings, and all (most? should be all) of fs can handle both strings and files.

Regarding manipulating paths as strings, could you show me an example of the kind of things you do? I'd be surprised if fs's insistence on returning File objects hindered your ability to do this very much.

zsau commented 10 years ago

There's nothing wrong with Files per se, but the Clojure philosophy is that data should be kept as POD whenever possible, not locked up in objects (which are essentially mini-DSLs). Returning Java objects also makes the library (a) slightly harder to use for Clojurians who don't know Java, and (b) less portable (e.g. if someone wanted to port fs to CLJS or Clojure.NET, obviously they can't have it return Java File objects, meaning client code must change).

Regarding string manipulation, I sometimes need to deal with relative paths (e.g. an rsync-like tool I wrote that copies a directory tree to a new location). Other times I want to do a regex replacement on a path, or some other string operation that fs doesn't directly support. Of course one can convert from Files to strings easily, but the reverse is also true: so it's largely a question of which type of return value is more convenient for users of fs. I can only offer a single data point on that, which is that I personally find strings more convenient than Files.

Raynes commented 10 years ago

The Clojure philosophy is and always has been that if Java isn't broke, Clojure shouldn't fix it. Once again, File objects are designed for representing files. Given how easy it is to go between them and strings, I just can't justify not returning File objects.

a) Clojure is already hard to use if you plan to ignore Java and its libraries. The fact that you can even work with Java libraries is one of Clojure's biggest selling points. b) No matter what you do, if you're working with files somebody somewhere is going to have and use a File object. No getting around that. If you don't call File object methods yourself (instead calling fs's functions on them), it's hardly less portable than just using strings.

zsau commented 10 years ago

OOP is broke, in that objects can't be manipulated generically by a large library of standard functions. ("It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.") Clojure has many functions that operate on strings, but not many that operate on Files.

It's true that Clojure is tightly integrated with Java, but it doesn't generally force Java objects upon you except when there's a clear advantage in doing so. Other than performance (which I can only comment on anecdotally), are there any other advantages to returning File objects? Yes they're designed for representing files, but that isn't actually an advantage in itself, is it?

Raynes commented 10 years ago

Sure on the first count, but you can get a string if you want it. I posit that users of this library don't often do hefty string manipulation on paths. I also posit that if they do, they're probably concatenating paths and they should do that with the file function which accepts strings, Files, or a combination of both. Most importantly, if they really do need to do it, they can.

What you're proposing involves doing this in almost every single function in fs: Take string, convert to File, call File methods, call str on File object. Every time. That's really backwards, isn't it?I just don't think I can live with myself if I have a library full of functions that simply wrap and unwrap strings on every call when in 99% of cases having a File object instead of a string path is perfectly acceptable.

This is originally what fs did -- it wrapped and unwrapped every time. If I force these functions to return a string, what about the users who need/want a File object? If they want a string they can easily get one from the File objects fs currently returns. If all fs functions returned string paths and they don't want a string, they'd have to wrap the returned string in an objectively more useful File object yet again, requiring that this path get wrapped in File twice for one single fs function call on it. Want to pipe these paths through numerous fs functions? Wrap and unwrap on every single one.

I haven't done any benchmarking so I don't know how big of an impact on performance it would be, but what I'm really hung up on is that it screams "fundamentally wrong" to do this wrapping all over the place. I don't see enough benefits to warrant the indirection.

Finally, please don't take my apprehension as disinterest. I'm just trying to make sure we understand each other!

zsau commented 10 years ago

I can definitely understand your dislike for constant wrapping and unwrapping from an aesthetic standpoint. But I'd argue the need to wrap/unwrap is a symptom of the poor design of Java's APIs, and would not indicate any problem with fs' design.

As a Clojurian who knows Java but only touches it when absolutely necessary, I appreciate it when Clojure libraries shelter me from Java's nastiness. I'd guess that many users of fs probably feel the same, since File already can do much of what fs does.

If there were benchmarks showing significant performance losses due to the wrapping/unwrapping, I'd definitely reconsider my opinion. On the other hand, would benchmarks that showed very minimal performance losses be able to sway your opinion, or are the aesthetic concerns paramount?

Raynes commented 10 years ago

Well, the paramount issue is that I don't agree that Java APIs are inherently nasty and terrible. File is a pretty tame class. I feel like the need to wrap and unwrap would be symptom of my poor design if I decided to make the API function this way -- not a symptom of Java's poor design. :P

The point of fs isn't really to produce a library replacing java.io.File, but to produce a library supplementing it. The reason that fs wraps so much of the File class is just to support the current working directory magic that fs provides. I would never, ever, ever, ever in a million years wrap a Java method solely so I didn't have to type a dot and could pretend that there wasn't any Java underlying what I'm doing! In fact, of the JVM had support for switching the current working directory at runtime, most of fs would disappear, rendered unnecessary. Only the various utilities it provides that File doesn't would remain.

That said, I suppose if it were very important to someone, they could throw together a thin wrapper around the API to stringify everything, but ick. I mean, most of these functions simply become (comp str fs/somefunction). It's exactly that, how simple it is to go from File to string, that makes me surprised you feel so strongly about this.

zsau commented 10 years ago

Your second paragraph explains a lot. I guess it means fs isn't the library for me, since I absolutely would still want a thin wrapper around common File functions even without the cwd stuff.

Raynes commented 10 years ago

Totally understandable. But, just so we're clear, you realize that in most cases it's literally the difference between:

(parent some-file)

and

(.getParent some-file)

In a lot of places, the APIs are exactly the same without regards to naming. The one single thing that fs does in these kinds of wrapper functions is call its own file function on the input so that the current working directory is maintained. It is the thinest of thin wrappers ever, and without the cwd stuff the only reason to wrap those methods in functions would be a) because you need a higher order function you can pass to another function b) you really, really had leading dot characters :p

Given this, why do you need a wrapper so badly? Totally curious for educational purposes about my market on my part. :)

zsau commented 10 years ago

Yes, I do realize that. I don't "need" a wrapper of course, but I want one for convenience reasons (string operations, HOFs) as well as for isolation from Java APIs. Basically, I want the API that a hypothetical native Clojure filesystem library would have; whether it happens to be backed by Java is immaterial to me.

Raynes commented 10 years ago

Fair enough!

That said, I have to encourage you to try to be more open to Java APIs. They're not all that bad. Some of them, such as the Joda time libraries, are actually pretty darn good. Joda is an example where you would think you have a great wrapper (clj-time), and indeed it is a great wrapper, but it isn't all inclusive and one still needs to use the Joda Java APIs directly from time to time.

For what it is worth, I've never written a line of Java in my life. Everything I know about Java was learned simply by working with Java APIs from Clojure. I'm a proponent of wrapping when the underlying API can benefit from being Clojureized. The File API doesn't fall under that umbrella for me. :(

In any case, I appreciate the discussion and interest in the library. I have some ideas about how I could potentially expose the API as string-based as well without a ton of work on my part, and we might get a neat secondary library out of the deal.

:beers:

lkrubner commented 10 years ago

I think it is fine to return objects or strings, so long as the library is consistent. I hate having to guess (and I find myself testing functions in the repl with (pprint) to see what I really get back). Objects are fine, just be consistent. (and thanks for all the work you've put into this)