Closed kostrzewa closed 4 years ago
@marcuspetschlies I just noticed that we had you in the ETMC organisation with the wrong user name...
So you basically want to have this in C?
def parse(string):
return list(map(float, string.split(', ')))
parse <- function (string) {
as.numeric(str_split(string, ', '))
}
It's already in there, the question is why strtok
tokenizes on the 'e'...
What does token = strtok(NULL, " =,\t");
do? This looks rather fishy.
Oh. Documentation
Alternativelly, a null pointer may be specified, in which case the function continues scanning where a previous successful call to the function ended.
What a nice API …
What a nice API …
Sure, C API is nasty for historical/efficiency reasons, but it's what we're stuck with in this case.
I would think that the error is something else. Using a minimal example I can happily split such an input string.
#include <stdio.h>
#include <string.h>
// The `strtok` function will insert `\0` terminators into the given string.
// This function will still print the whole string as we know the length,
// emitting a `\0` when it finds a `\0`.
void force_print(char const *const str,
int const len,
char const *const message) {
printf("\n%s\n ", message);
for (int i = 0; i < len; ++i) {
if (str[i] == '\0') {
printf("\\0");
} else {
printf("%c", str[i]);
}
}
printf("\n");
}
int main() {
// We must have a mutable string for `strtok` to work.
char input[] = "residual = 1e-3, 3.2e-4, 9.8e-12";
int const len = strlen(input);
force_print(input, len, "Our original string:");
// We initialize the tokenization here with a copied pointer such that we can
// still print out the whole original string.
char *token;
char *it = input;
token = strtok(it, " =,\t");
force_print(input, len, "After initial strtok:");
printf("Token: `%s`\n", token);
do {
token = strtok(NULL, " =,\t");
force_print(input, len, "After follow-up strtok:");
printf("Token: `%s`\n", token);
} while (token != NULL);
return 0;
}
The output seems to be just what we want:
$ gcc -Wall -Wpedantic -fsanitize=address toc_test.c; and ./a.out
Our original string:
residual = 1e-3, 3.2e-4, 9.8e-12
After initial strtok:
residual\0= 1e-3, 3.2e-4, 9.8e-12
Token: `residual`
After follow-up strtok:
residual\0= 1e-3\0 3.2e-4, 9.8e-12
Token: `1e-3`
After follow-up strtok:
residual\0= 1e-3\0 3.2e-4\0 9.8e-12
Token: `3.2e-4`
After follow-up strtok:
residual\0= 1e-3\0 3.2e-4\0 9.8e-12
Token: `9.8e-12`
After follow-up strtok:
residual\0= 1e-3\0 3.2e-4\0 9.8e-12
Token: `(null)`
So I would say that the strtok
function works as it should and therefore there must be something wrong somewhere else in the logic.
Thanks for the MWE, I agree that the problem must lie somewhere else then, although I also printed the tokens and it was splitting after the 'e'. Perhaps flex
is mangling the input string somehow.
Alright, I got the bugger. The pattern for STRLIST
was not defined correctly, causing flex to break the input string at the e
for reasons that I don't fully understand. Thanks @martin-ueding
Wanted to enlist some help with solving a particularly annoying issue that has come up with the development of more parsing functionality for the tmLQCD QUDA interface.
Per MG-level parameters, which are specified as something like
are now set using
parse_quda_mg_XXX_par_array
utility functions which nicely encapsulate the tokenization of the comma-separated list. However, for parsing lists of doubles (or integers), I've encountered the problem that the tokenizer does not want to work for scientific notation, splitting something like1e-4
into1e
and4
.See code and utility functions below. Any idea on how this could be resolved? This becomes quite annoying when specifying the setup solver tolerance, which is between
1e-8
and1e-6
, but also for theMGEigSolver
functionality, small tolerances are required.