CoinFuMasterShifu / janusminer

MIT License
11 stars 9 forks source link

HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION #10

Closed username-rand closed 6 months ago

username-rand commented 6 months ago
  1. HiveOS 0.6-225@240129
  2. AMD driver (default) 5.4.6 (6.1.2307)
  3. janusminer_hiveos.0.0.13
  4. GPU AMD BC-250
  5. GPU CPU OC 1400/750 MEM 1000

Miner started at :

Janushash Miner (By CoinFuMasterShifu) ⚒ ⛏ [2024-01-30 16:00:30.234] [warning] Stratum parameter '-u' is ignored because direct-to-node mining is enabled via '-a' OpenCL installations for the following GPUs were detected:

Using all GPUs. Using 12 CPU threads for Verushash [2024-01-30 16:00:30.283] [info] Node RPC is 192.168.3.43:3000

Crashed at: [2024-01-30 16:04:38.855] [info] Janusscore: 57.790685 mh/s [2024-01-30 16:04:46.411] [warning] CPU queue drained :0:rocdevice.cpp :2672: 9751346262 us: 804147: [tid:0x7f267c61d700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29

01/31/2024 Removed the memory overclock yesterday. OC's are now just GPU 1400/750. Still getting failures every 5-30 minutes with janusminer-0.0.13

this mornings latest failure: [2024-01-31 08:42:22.170] [info] Thread#11: 934.381000 kh/s [2024-01-31 08:42:22.170] [info] Janusscore: 58.043071 mh/s :0:rocdevice.cpp :2672: 69608825994 us: 1672875: [tid:0x7f8f4181d700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29

Tried the hiveosbeta miner 0.0.13. Also failed after mining a while: hiveosbeta-0.0.13 log beginning: Janushash Miner (By CoinFuMasterShifu) ⚒ ⛏ OpenCL installations for the following GPUs were detected:

Using all GPUs. Using 12 CPU threads for Verushash [2024-01-31 09:11:33.521] [info] Node RPC is 192.168.3.43:3000 [2024-01-31 09:11:41.857] [warning] CPU queue drained [2024-01-31 09:11:51.352] [info] Total hashrate (GPU): 217.244836 mh/s [2024-01-31 09:11:51.352] [info] gfx1013:xnack-: 217.244836 mh/s [2024-01-31 09:11:51.352] [info] Total hashrate (CPU): 10.574412 mh/s [2024-01-31 09:11:51.352] [info] Thread#0: 859.670000 kh/s [2024-01-31 09:11:51.352] [info] Thread#1: 813.949000 kh/s [2024

hiveosbeta-0.0.13 log at failure: [2024-01-31 09:28:54.280] [info] Thread#8: 892.980000 kh/s [2024-01-31 09:28:54.280] [info] Thread#9: 882.697000 kh/s [2024-01-31 09:28:54.280] [info] Thread#10: 906.225000 kh/s [2024-01-31 09:28:54.280] [info] Thread#11: 875.293000 kh/s [2024-01-31 09:28:54.280] [info] Janusscore: 57.825421 mh/s :0:rocdevice.cpp :2672: 72407039462 us: 1843781: [tid:0x7fa5cba1d700] Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29

Only overclocks are Core 1400 VDD 750 (Removed the memory overclock)

username-rand commented 6 months ago

Someone suggested changing -t (--threads). I had been using either the default or specifing -t 0 (which I understand is also the default). The miner was using all threads (12) and failing. I changed the threads option to --threads=10 on both the overclocked and non-overclocked card and reran hiveosbeta. The miner has been running all night without issue. Apparently leaving a couple of threads free solve this issue.

CoinFuMasterShifu commented 6 months ago

Interesting. 0 is the default and means max threads supported by hardware concurrency. Do you still have problems now after the big update from version 0.1.0 on?

username-rand commented 6 months ago

Updated all BC-250's to 0.1.4 hiveosbeta version of miner. All BC-250's running hiveos version0.6-225@240204 and AMD driver 525.125.06. -t 0 was set on all BC-250s's. Same flight sheet is used for all BC-250's so all are configured the same. 10 of the 12 BC-250's started mining, but don't know if they will last as that happened with the previous version, they started then failed later. Two failed to start.

The first card hung with the following in the log:

Janushash Miner (By CoinFuMasterShifu) ⚒ ⛏ [2024-02-07 09:57:46.536] [warning] Stratum parameter '-u' is ignored because di rect-to-node mining is enabled via '-a' OpenCL installations for the following GPUs were detected:

Using all GPUs. Using 12 CPU threads for Verushash [2024-02-07 09:57:46.586] [info] Node RPC is 192.168.3.43:3000 terminate called after throwing an instance of 'cl::Error' what(): clCreateCommandQueueWithProperties

I then restart the miner and it hung again with this log:

Janushash Miner (By CoinFuMasterShifu) ⚒ ⛏ [2024-02-07 10:00:18.803] [warning] Stratum parameter '-u' is ignored because di rect-to-node mining is enabled via '-a' OpenCL installations for the following GPUs were detected:

Using all GPUs. Using 12 CPU threads for Verushash [2024-02-07 10:00:18.854] [info] Node RPC is 192.168.3.43:3000

The second miner also hung with these log entries:

Janushash Miner (By CoinFuMasterShifu) ⚒ ⛏ [2024-02-07 10:01:12.416] [warning] Stratum parameter '-u' is ignored because di rect-to-node mining is enabled via '-a' OpenCL installations for the following GPUs were detected:

Using all GPUs. Using 12 CPU threads for Verushash [2024-02-07 10:01:12.466] [info] Node RPC is 192.168.3.43:3000

Suggest you put the version number after Miner on the first line to more clearly know which version is running.

Let me know if there is anything else to do? Don't know if it is better to run --threads 10 or -t 0 but I think I'll go back to --threads=10.

username-rand commented 6 months ago

One more piece of information. After rebooting those two BC-250's. All miners are now running with -t 0. Is it better to run -t 0 or with --threads?

CoinFuMasterShifu commented 6 months ago

-t is short for --threads, it accepts a number, 0 means auto-detect max number hardware supports in parallel.

username-rand commented 6 months ago

Thanks. Still have a few occasions where miner hashrate goes to zero but think it is because of overclocks. Thanks for your help!