01mf02 / jaq

A jq clone focussed on correctness, speed, and simplicity
MIT License
2.61k stars 63 forks source link

ER: --slurpfile and process substitution #155

Closed pkoppstein closed 4 months ago

pkoppstein commented 5 months ago

Contrast:

$ jq -nc --slurpfile x <(echo 1) '$x'
[1]
$ gojq -nc --slurpfile x <(echo 1) '$x'
[1]
$ jaq -nc --slurpfile x <(echo 1) '$x'
Error: /dev/fd/63: Invalid argument (os error 22)

$ jaq --version
jaq 1.3.0
kklingenberg commented 5 months ago

This happens here in line 267: https://github.com/01mf02/jaq/blob/ce988d55ec1fefaf998a8b77a7c4bb97705852b2/jaq/src/main.rs#L265-L268

It works if you swap out the mmap call with a read_to_end() call and change the signatures appropriately (diff shown below). I didn't measure the preformance downgrade though.

diff --git a/jaq/src/main.rs b/jaq/src/main.rs
index e4d1a86..33dec80 100644
--- a/jaq/src/main.rs
+++ b/jaq/src/main.rs
@@ -1,5 +1,6 @@
 use clap::{Parser, ValueEnum};
 use jaq_interpret::{Ctx, Filter, FilterT, ParseCtx, RcIter, Val};
+use std::io::Read;
 use std::io::{self, BufRead, Write};
 use std::path::PathBuf;
 use std::process::{ExitCode, Termination};
@@ -262,9 +263,11 @@ fn parse(filter_str: &str, vars: Vec<String>) -> Result<Filter, Vec<ParseError>>
     }
 }

-fn mmap_file(path: &std::path::Path) -> io::Result<memmap2::Mmap> {
-    let file = std::fs::File::open(path)?;
-    unsafe { memmap2::Mmap::map(&file) }
+fn mmap_file(path: &std::path::Path) -> io::Result<Vec<u8>> {
+    let mut file = std::fs::File::open(path)?;
+    let mut contents = Vec::new();
+    file.read_to_end(&mut contents)?;
+    Ok(contents)
 }

 fn invalid_data(e: impl std::error::Error + Send + Sync + 'static) -> std::io::Error {

My guess is that the mmap operation depends on the file living long enough. I tried swapping out the mmap crate with fmmap, which didn't need the file-opening step, but it failed the same way.

01mf02 commented 4 months ago

Thanks for the bug report, @pkoppstein, and thanks for the detailed analysis, @kklingenberg! I believe that the problem is not that the file lives long enough; it is because the file produced by <(echo 1) is not a regular file that can be memory-mapped. I originally used mmap instead of read_to_end because the latter is generally way slower. So we should not unconditionally use read_to_end, but only when mmap fails. That's what I did in my PR #168.