fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.09k stars 167 forks source link

intelligent slice functionality for strings #413

Closed aman-godara closed 3 years ago

aman-godara commented 3 years ago

Description

Name of functionality: Slice Signature: slice(string, start(optional)= 1 or len(string), end(optional)=len(string) or 1, stride(optional)= 1, include_start(optional)= .true.) Output: a new string_type object

Traverses the input string from start index to end index taking a stride of stride indexes to return a new string. start can be greater than end as well giving function an added functionality of reversing input string. start index will always be included in the output substring unless _includestart is set to .false. where the end index will be included. So either start index or end index will be included in the output string (or both). This function is an intelligent one, if the user doesn’t provide any one or more of the 3 optional arguments (start, end or stride) it figures them out automatically using optional arguments which are given (see examples). But if the user provides arguments he/she is expected to be responsible for that (see example 3).

Examples:

  1. slice(‘12345’, stride=-1) should give '54321'; start = 5, end = 1
  2. slice(‘abcd’, stride=-2, include_start=.false.) should give 'ca'; start = 4, end = 1
  3. slice(‘abcde’, 5, 2, 1) will give '' (empty string); user gave 1 as the stride argument
  4. slice(‘abcde’, 5, 2) will give 'edcb'; stride = -1
  5. slice(‘abcde’, end = 1, stride = -2) will give 'eca'; start = 5

Prior Art

Python has it, but not exactly in the manner proposed above. In python one can do

str = 'spiderman'
print(str[1:9:2])

which will return 'pdra'.

Please review the functionality and let me know your thoughts on this.

aman-godara commented 3 years ago

Other possibility could be to have flexible stride argument. Where instead of asking user to give exact value of stride, user will be asked to give the absolute value of stride. taking Example 3 from above: slice(‘abcde’, 5, 2, 1) will give 'edcb'; stride was converted from 1 to -1.

ivan-pi commented 3 years ago

There is subroutine extract proposed in https://github.com/fortran-lang/stdlib/issues/406 with similar functionality to slice. I'm not sure if the original proposal included an optional stride argument.

In any case I would support this. It might be a good idea to overload this for the intrinsic character type, even if the intrinsic slice syntax exists.

aman-godara commented 3 years ago

I prefer first possibility over second possibility because if a user wants to avail second possibility he/she can do that by using an if else condition before passing arguments to code written in #414. But if we implement slice functionality the second possibility way, an user who is interested in first possibility won't be able to use the slice function.

Beliavsky commented 3 years ago

Can the slice function be ELEMENTAL? It appears to me that it can, since its arguments are scalars and it returns a scalar.

awvwgk commented 3 years ago

@Beliavsky Consider this example:

print *, slice("abcdef", 1, [1, 2, 3, 4, 5, 6], 1)

The resulting scalar character values cannot be in an array due to the different length.

Carltoffel commented 3 years ago

Since this approach doesn't use Fortran indices but function arguments instead, we could also allow negative start/end values which will count from the end. There are two ways I can think of:

1) do it exactly like in python, -i would then be a shortcut for len(string)-i slice('abcd', end = -1) returns abc 2) or keep the Fortran standard of indices starting at 1, but counted from the end: -2 is the index of the letter 'c' in 'abcd' slice('abcd', end = -2) returns abc

ivan-pi commented 3 years ago

Here is the description of the extract function proposed in section 3.7.1 of the iso_varying_strings document:

3.7.1 EXTRACT(string[, start,finish]) Description. Extracts a specified substring from a string. Class. Elemental function. Arguments. string shall be either of type VARYING_STRING or type default CHARACTER start (optional) shall be of type default INTEGER. finish (optional) shall be of type default INTEGER. Result Characteristics. Of type VARYING_STRING. Result Value. The result value is a copy of the characters of the argument string between positions start and finish, inclusive. If start is absent or less than one, the value one is used for start. If finish is absent or greater than LEN(string), the value LEN(string) is used for finish. If finish is less than start, the result is a zero-length string.

ivan-pi commented 3 years ago

Since this approach doesn't use Fortran indices but function arguments instead, we could also allow negative start/end values which will count from the end. There are two ways I can think of:

1. do it exactly like in python, `-i` would then be a shortcut for `len(string)-i`
   `slice('abcd', end = -1)` returns `abc`

2. or keep the Fortran standard of indices starting at 1, but counted from the end: `-2` is the index of the letter 'c' in 'abcd'
   `slice('abcd', end = -2)` returns `abc`

@Carltoffel , there some discussion of the different indexing systems in https://github.com/fortran-lang/stdlib/pull/311#issuecomment-779417445. Not sure if they are applicable here.