fuhsnn / slimcc

Small C11 compiler for x86-64 with GNU/C23 extensions
MIT License
4 stars 0 forks source link

Feedback for unicode.c #77

Open mspiegel opened 1 month ago

mspiegel commented 1 month ago

First, I want to say thank you for your work! I was looking for a version of chibicc that had continued development. The work you have done is very impressive.

I have two small pieces of feedback for unicode.c. The value '-1' is used as a sentinel value at the end of an array of uint32_t values. Since sizeof(uint32_t) doesn't change across platforms, -1 could be replaced with 0xFFFFFFFF in uncode.c. I must admit, I did not know that "unsigned int i = -1" is a common C idiom.

The second piece of feedback is that the function in_ordered_range() traverses through the array with a stride of 2. The checking for the sentinel value '-1' at the end of the array works only if the array length is an odd number. If the length of the array is even then the loop will fall off the end of the array. Can this be fixed by using two consecutive -1 (or 0xFFFFFFFF) values at the end of the array instead of a single sentinel value? With two sentinel values then one of those values will always be checked if the length of the array is even or odd.

fuhsnn commented 1 month ago

I did not know that "unsigned int i = -1" is a common C idiom

Yes, this assumed two's complement and integer casting rule being intact. I see it can be confusing.

in_ordered_range()

The function is only intended for existing static tables in unicode.c, which are formed in first-last pairs, I'll clarify the usage in comment.

For processing unknown input, I think it's better to iterate with explicit type and array size, like

struct UTF32_Codepoint_Range {
  uint32_t first;
  uint32_t last;
};
bool fn(struct UTF32_Codepoint_Range *, size_t);