docopt / docopts

Shell interpreter for docopt, the command-line interface description language.
Other
494 stars 53 forks source link

docopt_get_values: doesn't handle arguments with whitespaces properly #70

Open nazarewk opened 1 year ago

nazarewk commented 1 year ago

This is not what I was expecting:

source "$(command -v docopts).sh"
declare -A args
args['FILE,#']=2
args['FILE,0']=somefile1
args['FILE,1']='some file with space inside'

array=( $(docopt_get_values args FILE) )

if [[ "${array[1]}" != 'some file with space inside' ]] ; then
    echo "element #1 is not properly set"
fi

if [[ "${array[*]}" == 'somefile1 some file with space inside' ]] ; then
    echo "5 elements - 1 per word"
fi

There is no indication it should not work in docs or comments

Sylvain303 commented 1 year ago

Hello @nazarewk

Apparently yes you're right, this function is poorly tested and poorly used. It returns a string suitable for $IFS explode for for loop. Apparently I never used it anywhere else but in bats testcase.

Do you think we should return something else? Bash array was poorly supported in macOS in the past, and people using bash array are warriors (because the syntax and behavior is awful) so I suppose the problem nerver arise before. :sweat_smile:

Would you write a sample bash-wish-code you would like to use?

declare -a myarray

# I want array!
#How I would like to receive data into my myarray?

We have some old and odd discussion on stackoverflow...

Bash has some builtin functions that we could use: readarray and printf -v var :thinking:

Or you could explain the usecase you're trying to achieve may be another approche will also let us discover some new feature we could use?

Hope, that helps.

Thanks for the bug report, I will document and fix.

Regards, Sylvain.

Sylvain303 commented 1 year ago

Hello @nazarewk

I made some test with GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

#!/usr/bin/env bash

# master branch docopts.sh relative path
source ../../docopts.sh

declare -A args
args['FILE,#']=2
args['FILE,0']=somefile1
args['FILE,1']='some file with space inside'

test_array()
{
  if [[ "${array[1]}" != 'some file with space inside' ]] ; then
    echo "element #1 is not properly set"
  fi

  echo "array as ${#array[@]} elements"

  if [[ "${array[*]}" == 'somefile1 some file with space inside' ]] ; then
    echo "5 elements - 1 per word"
  fi
}

docopt_get_values2() {
    # a nameref with -n
    local -n __our_array_ref=$1
    local opt=$2
    local nb_val=${__our_array_ref[$opt,#]}
    local i=0
    while [[ $i -lt $nb_val ]]
    do
        printf "%s\n" "${__our_array_ref[$opt,$i]}"
        i=$(($i + 1))
    done
}

echo "================== docopt_get_values"
unset array
array=( $(docopt_get_values args FILE) )
test_array

echo "================== docopt_get_values2"
docopt_get_values2 args FILE
unset array
IFS=$'\n' array=( $(docopt_get_values2 args FILE) )
test_array

In the sample above, I also make use of nameref in bash, which facilitate (not the code reading) but the usage of indirect variable name use. Which permits to discard eval which I prefer to avoid because of too many "privileges" involved in such code.

eval code was kept to maintain very old bash compatibly for macOS users. Are you using macOS?

Apparently bash $IFS has a strong behavior on array assignment.

#!/usr/bin/env bash

f ()
{
  # generate fixed text with space
    echo '
"2011-09-04 21.43.02.jpg"
"2011-09-05 10.23.14.jpg"
"2011-09-09 12.31.16.jpg"
"2011-09-11 08.43.12.jpg"
'
}

tmp=$(mktemp ./tmpXXX.sh)
# without IFS change in "normal" fixed bash eval/REPL loop
cat > $tmp << EOT
FILES=(
$(f)
)
EOT
echo 'echo ${#FILES[@]}' >> $tmp

echo $tmp
bash $tmp

# now in function eval
FILES=( $(f) )
echo ${#FILES[@]}

# with IFS changed to only include \n
unset FILES
IFS=$'\n' FILES=( $(f) )
echo ${#FILES[@]}

Which outputs

./tmpp25.sh
4
8
4

As docopt_get_values is clearly relying on bash array which require bash >= 4 (I don't know the exact version). May we could craft an helper, if needed, that supports space keeping... :thinking:

I'm seeing those options:

# in doc usage, as you did, that bash don't keep space in function call evaluation/assignent 
# beware that file shouldn't contains \n neither (who's is doing that?) 
IFS=$'\n' array=( $(docopt_get_values2 args FILE) )

Or we may provide a nameref trick function that could "fill" the array named variable passed as argument. (not tested)

docopt_fill_array_with_values args FILE array

What do you think?

nazarewk commented 1 year ago

To be honest anything that would make the behavior crystal clear would suffice.

I'm not 100% happy with it (declaring global array, still docopt should probably be handling the whole script globally anyway), but I moved to docopt_get_eval_array invocation instead at https://github.com/cachix/devenv/commit/b640f708#diff-745f73299e82da7d570ef258ed08096e451e43a18d07f01ae59dcf03dc070f53

Sylvain303 commented 1 year ago

@nazarewk

Thanks for you feedback.

I'm not sure at this level of coding (using array + eval + assoc array), that bash will help to make things "cristal clear". :wink:

I'm not 100% happy with it: declaring global array

If your target shell version is supporting array and assoc array, you're not forced to use globals. But you've to consider/handle the bash scope mechanism: using $() or pipe can/will create new inner scope. declare -a myarray will create a local scoped array if used inside a function. That's why docopts introduced a --no-declare so hackers could play with it.

At the time docopts was designed the docopts.sh wasn't exist. And then macOS limitation make that helper script a bit more complex it was originally. Using eval is a wrong habit, but for modifying parent script env, there's not so much alternative. The original docopts API design was using it, so I kept that too. For being retro-compatible.

So I made a PoC function which seems to behave has expected:

# Doc:
# converts a repeatable option parsed by docopts into a bash 4 ARRAY
#   ARGS['FILE,#']=3
#   ARGS['FILE,0']=somefile1
#   ARGS['FILE,1']=somefile2
#   ARGS['FILE,2']=somefile3
# 
# Usage:
#  delcare -a myarray  
#  docopt_fill_array_with_values ARGS FILE myarray 
docopt_fill_array_with_values()
{
    # a nameref with -n
    local -n __our_array_ref=$1
    local opt=$2
    local array_name=$3
    local nb_val=${__our_array_ref[$opt,#]}
    local i=0
    while [[ $i -lt $nb_val ]]
    do
        local -n arr_fill="${array_name}[$i]"
        printf -v arr_fill "%s" "${__our_array_ref[$opt,$i]}"
        i=$(($i + 1))
    done
}

:arrow_up: What is interesting here: I used bash >4 mechanism which are making the inner code less clear. I'm using nameref twice and printf -v var which cannot use array as target but it seems it can use nameref :+1: . The whole function provide the expected behavior as handling a destination named third argument $3.

Which let me think I could reuse the same behavior for getting rid of eval may be in the docopts.sh wrapper / helper. :thinking:

still docopt should probably be handling the whole script globally anyway

I'm not sure what you're wanting to say with the sentence above?

docopts is mimicking the docopt (without S) behavior for bash. Argument parsing is supposed to be done as global script scope. Then you call what ever inner functions you want. We had a aborted discussion about bringing docopts to function call in #43, but it's not the design of docopts.

I'm not sure how to accomplish that in a "code readable" manner nor with good performance too. :shrug:

Sylvain303 commented 1 year ago

test_with_space.sh.txt :arrow_up: updated the sample code for testing behavior