Closed jinz2014 closed 1 week ago
__device__
loc_ht& ht_get_atomic(loc_ht* thread_ht, cstr_type kmer_key, uint32_t max_size){
unsigned hash_val = MurmurHashAligned2(kmer_key, max_size);
unsigned orig_hash = hash_val;
while(true){
int prev = atomicCAS(&thread_ht[hash_val].key.length, EMPTY, kmer_key.length);
int mask = __match_any_sync(__activemask(), (unsigned long long)&thread_ht[hash_val]); // all the threads in the warp which have same address
if(prev == EMPTY){
thread_ht[hash_val].key.start_ptr = kmer_key.start_ptr;
thread_ht[hash_val].val = {.hi_q_exts = {0}, .low_q_exts = {0}, .ext = 0, .count = 0};
}
__syncwarp(mask);
if(prev != EMPTY && thread_ht[hash_val].key == kmer_key){
//printf("key found, returning\n");// keep this for debugging
return thread_ht[hash_val];
}else if (prev == EMPTY){
return thread_ht[hash_val];
}
hash_val = (hash_val +1 ) %max_size;//hash_val = (hash_val + 1) & (HT_SIZE -1);
if(hash_val == orig_hash){ // loop till you reach the same starting positions and then return error
printf("*****end reached, hashtable full*****\n"); // for debugging
printf("*****end reached, hashtable full*****\n");
printf("*****end reached, hashtable full*****\n");
}
}
}
The original CUDA program is https://github.com/leannmlindsey/gpu_local_ht
Hi @jinz2014. Internal ticket has been created to assist with your question. Thanks!
Hi @jinz2014, __syncwarp() is one CUDA function that HIP doesn't provide a direct equivalent for. You have a couple of options:
What is the HIP version of __syncwarp(mask) ?