Request: Allow passing URL for device name

yetamrra commented 4 years ago

It's useful that I can add an entry to /etc/sane.d/airscan.conf for a device:

[devices]
MyScanner = https://1.2.3.4:443/eSCL

Since the only value of MyScanner is the URL, it would be convenient to directly request a URL in sane_open(). Then I could do something like scanimage -d airscan:https://1.2.3.4:443/eSCL without updating files in /etc/sane.d.

Would you be open to a patch adding this?

alexpevzner commented 4 years ago

Hi @yetamrra,

actually, there are two values: URL and protocol. sane-airscan becomes multiprotocol:

[devices]
My ESCL Scanner = http://1.2.3.4/eSCL, eSCL
My WSD Scanner = http://4.3.2.1/WSDScanner, WSD

And IPP will be added later

So I have few questions:

What is the purpose of this patch?
How will I guess protocol?
In theory, DNS-SD may be any reasonable string, and it may have a valid URL syntax. How should I distinguish, if it the name or URL?

alexpevzner commented 4 years ago

One more thing, if you are considering contributing new feature, it is better to do it here: https://github.com/alexpevzner/sane-airscan-wsd

yetamrra commented 4 years ago

The main purpose is to enable people who can't write files to /etc/sane.d. For example, Chromebooks ship with a read-only rootfs, so not even root can write a file there.
I hadn't realized a second field had been added. I guess I should have looked at the wsd fork :) I see a couple of options: a. If protocol is the only additional value you plan to add, it could be combined with the URL, something like airscan:WSD@http://..., airscan:ESCL@https://..., airscan:IPP-Scan@http://.... etc. b. Pass the whole string as the device name, like "airscan:http://1.2.3.4/eSCL, eSCL". As long as the per-device entry fits on one line, I don't see any limitations to what you could pass this way. c. Combine the protocol into the URL, like wsd://..., escls://... etc. This one looks clean initially, but it might couple the protocol and URL in ways that are hard to extend later.

I like (a) as a matter of taste, but (b) seems perfectly workable and would probably result in a smaller change. Do you have a preference?

The existing table lookups should probably come first, with trying to parse the name as a direct entry only if nothing is found. If somebody has a device that describes itself as a URL that doesn't point back to itself, they probably already have other pathological things going on in their network.

alexpevzner commented 4 years ago

I guess ChromeOS comes with its own GUI application that allows to configure a scanner by entering its URL and parameters, correct?

What if instead of passing these parameters via cryptic string to sane_open(), this application will create a configuration file in a reasonable place of the user home directory, and sane-airscan will look to this directory too, when searching for configuration file?

It will have an advantage that scanners configured by this applications will also be visible to "standard" scanning apps like xsane, simple-scan and even libreoffice/openoffice.

Note, I need 3 parameters describing a scanner:

Scanner name - used for logging.
Scanner URL
Protocol. Currently eSCL/WSD (case-insensetive in config file), IPP will be added later

P.S. A while ago I've considered an idea to express protocol by URL scheme. As result we will get escl/escls, wsd/wsds, ipp/ipps and legacy http/https equal to escl/escls. I don't think this complexity can be explained to a common user :-)

yetamrra commented 4 years ago

Having it read config files from the home directory sounds like a nice idea. In fact, it seems like upstream sane could potentially look for configs in something like $XDG_CONFIG_HOME/sane to solve this problem in general rather than putting it specifically in sane-airscan.

Unfortunately, that won't help Chrome OS :) The reason is that our sane frontend (called lorgnette) runs under a separate userid that doesn't have access to the main user's home directory. Our flow looks roughly like this:

We do network discovery outside of SANE so that we can apply consistent policies across the OS (e.g. prefer https over http if the same device is reachable via both, enterprise policies that preload certain devices into the list or restrict which devices can be used, etc).
The user selects their scanner from the list above in the UI.
The UI talks to lorgnette over dbus and asks it to get the capabilities of a particular device.
The user sets up their scan parameters.
The UI opens an output file and passes the open fd and user's parameters to lorgnette with the request to get a scan from a particular device.
lorgnette uses libsane to get a scan from the specified device and writes it to the provided fd.

Steps 3 and 6 in that flow are much easier if the backend can accept a device string directly without having to explicitly configure the backend beforehand. It also helps users who can't write to /etc/sane.d for whatever reason, although I agree that use case is much less compelling for typical Linux users if the config can be read from the user's home directory. That said, if you think that's the wrong direction for airscan, I'm sure we can find a different scheme for Chrome OS.

alexpevzner commented 4 years ago

Well, it is not documented, but upstream SANE looks for configuration files in the current directory first, then in /etc/sane.d. I don't look to the current directory, but may add looking to some $HOME-relative directory, if somebody needs it,

I understand value of your approach for a user-oriented OS, like ChromeOS.

How do you plan to support WSD scanners in your design? WS-Discovery is a long story, completely different from DNS-SD. WSD endpoint URL is not possible to guess from the DNS-SD output.

Also in your design it seems to be impossible to scan in libreoffice directly to the document, without intermediate files (Insert->Media->scan->...). This is actually convenient feature (well, if libreoffice could be called "convenient").

I think tou life will be probably much easer, if we find a way to export from sane-airscan a couple of additional functions, not covered by the standard SANE API. At this case you will be able to use WS-Discovery from sane-airscan, rather that to implement your own

yetamrra commented 4 years ago

To be honest, we haven't considered WSD up to this point. We're currently focused on getting eSCL to work well. Now that you have a WSD implementation nearly ready, maybe we should reconsider that later in the year :) As you suggested, it might make sense to refactor the discovery part out to a separate libairscan-discovery.so library that could be shared by sane-airscan, your airscan-discovery, and our discovery service. I can file another issue to start that discussion when we're ready to take a look at that if you like.

As for LibreOffice, you're right that it or other SANE frontends won't easily be able to display the same list of scanners as our native UI. They run inside a debian container, so the user can install SANE and configure it there if desired. If we see demand, I could imagine that we might someday make a backend that knows how to talk to our native UI and keep the scanner lists in sync inside and outside the container. That's not on the current roadmap, though.

alexpevzner commented 4 years ago

I believe, WSD branch is ready for release.

Currently I'm updating documentation, and will release in a couple of days. To be honest, I'm a little bit nervous about possible regression for existent happy users: the code base is nearly doubled since latest legacy sane-airscan release. But eSCL support is also improved here, and there are some eSCL devices that works only with the WSD branch. So I think I should go ahead :-)

I know at least two devices that doesn't implement eSCL but works perfectly in the WSD mode: HP LaserJet Pro MFP M125 and HP LaserJet Pro MFP M521. There are actually must be a lot of them, but I've just only recently started wide WSD testing.

Refactoring discovery into the separate library will be hard, because it will have a lot of common infrastructure with SANE backend, like HTTP client, event loop, XML handling and so on. It would be much simpler to add to sane-airscan additional entry point that returns discovery information. The way how airscan-discover is implemented is a fast and dirty hack.

alexpevzner commented 4 years ago

Hi @yetamrra,

after some thinking I came to conclusion that I can accept this change.

With the WSD branch, device names looks like these:

airscan:e0:Kyocera ECOSYS M2040dn
airscan:w0:Kyocera ECOSYS M2040dn

The airscan prefix comes from sane-dll, and e0/w0 prefix has the following meaning:

'e' or 'w' is the protocol (eSCL/WSD). IPP-scan will use 'i' prefix
subsequent number assigned as UNIX PID: incremented every time new device is discovered and checked for uniqueness in a case of wrapping in combination with long-lived devices. It's purpose is to guarantee unambiguity even in unlikely case of DNS-SD name clashes between network interfaces.

String after the prefix is the device DNS-SD name, which is expected to be unique identifier, though human-readable.

The proposed syntax is following:

protocol:name:url

Protocol is "escl"/"wsd", case-insensetive. You should use id_proto_by_name() to decode it to ID_PROTO.
Device name is some reasonable string. It is used as log prefix and to name trace files. It should not contain semicolon characters.
url is the endpoint URL.

To implement it, it should be enough to modify zeroconf_devinfo_lookup() function in the airscan-zeroconf.c file. If its ident parameter can be recognized as containing a valid protocol prefix and has otherwise valid syntax, it should construct zeroconf_devinfo structure instead of obtaining it from tables. zeroconf_initscan_wait() should be bypassed at this case.

I'll ask you to submit your PR to the WSD branch. I will push it from there to the stable branch by myself.

With this change, sane-airscan will still be usable to perform WS-Discovery by your app.

alexpevzner commented 4 years ago

Resolved

alexpevzner commented 4 years ago

Hi @yetamrra,

I want just to notify you, that I've just released a 0.99.4 version, that includes your patch

yetamrra commented 4 years ago

@alexpevzner Thanks! We've been testing a git snapshot for a few days and will pull in the official release shortly.

alexpevzner commented 4 years ago

Hi @yetamrra,

How are you? Didn't hear from you for a long time...

you may want to look to the current state of the https://github.com/alexpevzner/sane-airscan-unstable (former sane-airscan-wsd). I've just pushed there a version that removes glib/libsoup dependency, implementing HTTP client by itself.

Now it should be much easier to add AF_UNIX support there, as required by ChromeOS for IPP-over-USB support.

This version about to released soon as "stable", after some more testing

yetamrra commented 4 years ago

Hi @alexpevzner,

Thanks for letting me know! From a quick look, it looks like most of the raw HTTP parsing work is delegated to the nodejs parser. Have you found that to present any limitations compared to something like libcurl?

alexpevzner commented 4 years ago

Hi @yetamrra,

well, nodejs HTTP parser is fairly complete, but it is only a parser, it doesn't implement a networking part of HTTP, and high-level stuff like HTTP redirect, authentication and so on. Also, it doesn't implement compressed (i.e., gzip) transfer encoding, and there is no MIME multipart parser (though multipart parser in libsoup was broken, so I had to reimplement it anyway).

Nodejs URL parser doesn't properly parse relative URLs (URLs without scheme), so I had to help it a little bit :-)

Also, things like computation absolute URL from base and relative URL are also in my hands.

From another hand, high-level HTTP libraries, like libsoup and libcurl, do a lot of automation, like following HTTP redirects, which is convenient, at first sight, but if you have a buggy firmware at another end of connection, this automation sometimes may make more trouble that benefits.

yetamrra commented 4 years ago

Thanks for the extra info. We're still interested in adding AF_UNIX support as you mentioned above, assuming you're willing to consider it. I'll open a new issue to discuss what that might look like.

alexpevzner commented 4 years ago

Yes, I remember. This is why I wrote to you: with the new HTTP client AF_UNIX support should be trivial to implement (in comparison to libsoup).

Could you remind me, what exactly should it do? In particular, how does it work if there are multiple USB scanners connected to the system?

alexpevzner / sane-airscan

Request: Allow passing URL for device name #25